[Bugs] [Bug 1657743] Very high memory usage (25GB) on Gluster FUSE mountpoint

bugzilla at redhat.com bugzilla at redhat.com
Thu Jul 4 11:10:02 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1657743



--- Comment #12 from ryan at magenta.tv ---
Hi Nithya,

Yes, there do seem to be a fair amount of errors on the mount.
[2019-07-01 09:23:38.178838] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 2-mcv01-client-1: server
172.30.30.2:49157 has not responded in the last 5 seconds, disconnecting.
[2019-07-01 09:23:38.178931] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 2-mcv01-client-2: server
172.30.30.3:49155 has not responded in the last 5 seconds, disconnecting.
[2019-07-01 09:23:38.179780] E [rpc-clnt.c:348:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (-->
/lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (-->
/lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (-->
/lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-1: forced
unwinding frame type(GlusterFS 4.x v1) op(OPENDIR(20)) called at 2019-07-01
09:23:32.334731 (xid=0x267b3d0f)
[2019-07-01 09:23:38.179809] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard (b58b070a-0915-4950-ac7d-06ba95d2f098)
[Transport endpoint is not connected]
[2019-07-01 09:23:38.179853] E [rpc-clnt.c:348:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (-->
/lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (-->
/lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (-->
/lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-2: forced
unwinding frame type(GlusterFS 4.x v1) op(OPENDIR(20)) called at 2019-07-01
09:23:32.334756 (xid=0x26fbbfee)
[2019-07-01 09:23:38.179886] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-2: remote
operation failed. Path: /data/shard (b58b070a-0915-4950-ac7d-06ba95d2f098)
[Transport endpoint is not connected]
[2019-07-01 09:23:38.179970] E [rpc-clnt.c:348:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (-->
/lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (-->
/lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (-->
/lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-1: forced
unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-07-01 09:23:32.334740
(xid=0x267b3d10)
[2019-07-01 09:23:38.180114] E [rpc-clnt.c:348:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (-->
/lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (-->
/lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (-->
/lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-2: forced
unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-07-01 09:23:32.334764
(xid=0x26fbbfef)
[2019-06-27 19:16:41.981791] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/190130_HSBA-S061_Customer
Films/Media/Media/30012019_Recordings (3d6f3e6a-16ee-452e-8d88-7051262c7f96)
[Transport endpoint is not connected]
[2019-06-27 19:16:41.992288] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John
Flint Year in the Life_SM/Media/6. 1Q18 Results Call
(9afd325f-952d-41ce-b3eb-f648e1b39ecf) [Transport endpoint is not connected]
[2019-06-27 19:16:42.004034] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/190313_HSB1-a487_The Edge
GLC Films_jp/Media/Raw FIles/5279P HSBC GLC Films Day01 FS7
Card01/.Spotlight-V100/Store-V2/0B66EF32-49B7-476E-A0F4-6481613ECFA4/journals.live
(b020e503-c4bc-4821-b75b-1fb37a2e3b0b) [Transport endpoint is not connected]
[2019-06-27 19:16:42.015580] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT
WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV
Rushes/VANCOUVER/EXPORT (c8083b1d-d4e2-4a81-8fb4-dd76c92902e7) [Transport
endpoint is not connected]
[2019-06-27 19:16:42.025179] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John
Flint Year in the Life_SM/Media/5. AGM 2018 (Jack Morton)/Main Cam
(662050a4-505f-438c-8375-e7dfced3ff47) [Transport endpoint is not connected]
[2019-06-27 19:16:42.034689] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT
WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV
Rushes/SHANGHAI/DCIM/100EOS5D (eb6d15e3-ae6f-4204-88b5-edfbea106d35) [Transport
endpoint is not connected]
[2019-06-27 19:16:42.280538] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John
Flint Year in the Life_SM/Media/18_Extra clips - 071218/YITL L42 ALL
SHOTS/JPEGS (cb59fca5-70d0-4f1a-9bf2-ec2f7c2dfc7d) [Transport endpoint is not
connected]
[2019-06-27 19:16:42.502718] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT
WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV
Rushes/SINGAPORE/DCIM/100EOS5D (471c59c9-8b18-4c71-99d3-607a469eaebd)
[Transport endpoint is not connected]
[2019-06-27 19:16:42.626356] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT
WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV
Rushes/SYDNEY/HSBC_SYDENY_OCT_2015/5D (060b7bd7-1f18-48bd-97f6-8b6b7f61b3af)
[Transport endpoint is not connected]
[2019-06-27 19:16:42.648426] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT
WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/NEW
YORK/HSBC_Broll/Card 4/CONTENTS (9044fb07-3895-4886-894f-2269a9d1db56)
[Transport endpoint is not connected]
[2019-06-27 19:16:42.661578] E [MSGID: 114031]
[client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote
operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John
Flint Year in the Life_SM/Media/18_Extra clips - 071218/YITL L42 ALL
SHOTS/DCIM/100EOS5D (7a0f5383-cb22-4a52-aa0a-19f004c29ee7) [Transport endpoint
is not connected]

We haven't any issues with the cluster reported, and when looking into the
transport endpoint error, I can't see any disconnects. I've grepped through the
brick logs to look for events at the time of some of the remote operation
failures and the only thing i've found is:
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265374] I
[addr.c:55:compare_addr_and_update] 0-/mnt/ctdb/data: allowed = "*", received
addr = "172.30.30.1"
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265405] I [login.c:111:gf_auth]
0-auth/login: allowed user names: 2af5fd84-b825-4df9-af5d-4c565930aec9
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265417] I [MSGID: 115029]
[server-handshake.c:495:server_setvolume] 0-ctdbv01-server: accepted client
from
CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0
(version: 4.1.8)
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.287664] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-ctdbv01-server: disconnecting connection
from
CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.287768] I [MSGID: 101055]
[client_t.c:444:gf_client_unref] 0-ctdbv01-server: Shutting down connection
CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0
mnt-h1a-data.log-20190630:[2019-06-27 19:16:31.741300] E
[rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message
(XID: 0x24c920bf, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport
(tcp.mcv01-server)

What version of Gluster are these code improvements available in? We're testing
6.3 at the moment.

Many thanks for your help,
Ryan

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list