[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

Vijay Bellur vbellur at redhat.com
Sun Mar 8 00:29:08 UTC 2015


On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
> Hi guys,
>
> We have rails app, which is using gluster for our distributed file
> system. The glusters servers are hosted independently as part of deal
> with other, we don't have any impact on them, we are connected o them by
> using gluster native client.
>
> We tried to resolve this issue using help from the admins of the company
> that is hosting our gluster servers, but they say that's the client
> issue and we ran out of ideas how that's possible if we are not doing
> anything special here.
>
> Information about independent gluster servers:
> -version: 3.6.0.42.1
> - They are using red hat
> -They are enterprise so the are always using older versions
>
> Our servers:
> System version: Ubuntu 14.04
> Our gluster client version: 3.6.2
>
> The exact problem is that it often happens(couple times a week) that
> errors in gluster causes proceses to become zombies. It happens with our
> application server(unicorn), nginx and our crawling script that is run
> as daemon.
>
> Our fstab file:
>
> 10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
> defaults,_netdev,nobootwait,fetch-attempts=10 0 0
> 10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
> defaults,_netdev,nobootwait,fetch-attempts=10 0 0
>
> Logs from gluster:
>
> 2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
> bc1c7e] (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
> _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
> 12:36:12.361489 (xid=0x5d475da)
> [2015-02-18 12:36:12.375765] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> /system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce
> b)
> [2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
> bc1c7e] (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
> _clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
> 12:36:12.361858 (xid=0x5d475db)
> [2015-02-18 12:36:12.376355] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d)
> [2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request]
> 0-drslk-prod-client-10: not connected (priv->connected = 0)
> [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit]
> 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
> (drslk-prod-client-10)
> [2015-02-18 12:36:12.376814] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-000000000000)
> [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify]
> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
> process will keep trying to connect to glusterd until brick's port is
> available
> [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit]
> 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
> (drslk-prod-client-10)
> [2015-02-18 12:36:12.376906] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-000000000000)
> [2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish]
> 0-drslk-prod-client-10: connection to 10.10.11.23:24007
> <http://10.10.11.23:24007/> failed (Connection refused)
> [2015-02-18 12:36:12.379296] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-000000000000)
> [2015-02-18 12:36:12.379700] W
> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-000000000000)
> [2015-02-18 13:10:52.759736] E
> [client-handshake.c:1496:client_query_portmap_cbk]
> 0-drslk-prod-client-10: failed to get the port number for remote
> subvolume. Please run 'gluster volume status' on server to see if brick
> process is running.
> [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify]
> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
> process will keep trying to connect to glusterd until brick's port is
> available
> [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig]
> 0-drslk-prod-client-10: changing port to 49349 (from 0)
> [2015-02-18 13:11:02.898097] I
> [client-handshake.c:1413:select_server_supported_programs]
> 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
> [2015-02-18 13:11:02.898446] I
> [client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10:
> Connected to drslk-prod-client-10, attached to remote volume
> '/GLUSTERFS/drslk-prod'.
> [2015-02-18 13:11:02.898460] I
> [client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10:
> Server and Client lk-version numbers are not same, reopening the fds
>

Can you provide the gluster volume configuration details?

It does look like frame-timeout for the volume has been set to 60. Is 
there any specific reason? Normally altering the frame-timeout is not 
recommended.

-Vijay



More information about the Gluster-users mailing list