[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]
Vijay Bellur
vbellur at redhat.com
Sun Mar 8 17:17:06 UTC 2015
On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
> I don't have volfiles, they are not on our machines as I said previously
> we don't have impact on gluster servers.
>
> I saw some graph that looks similiar to volume file on logs. I will
> paste it here but we don't really have any impact on that. We are just
> using client to connect to gluster servers, we are not in control of.
>
I would recommend to not alter the default for frame timeout.
>
> Btw, do you think that different versions of gluster client and gluster
> server could be an issue here?
>
It can potentially be. What versions are you using on the servers and
the client?
-Vijay
> 2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur at redhat.com
> <mailto:vbellur at redhat.com>>:
>
> On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>
> Hi guys,
>
> We have rails app, which is using gluster for our distributed file
> system. The glusters servers are hosted independently as part of
> deal
> with other, we don't have any impact on them, we are connected o
> them by
> using gluster native client.
>
> We tried to resolve this issue using help from the admins of the
> company
> that is hosting our gluster servers, but they say that's the client
> issue and we ran out of ideas how that's possible if we are not
> doing
> anything special here.
>
> Information about independent gluster servers:
> -version: 3.6.0.42.1
> - They are using red hat
> -They are enterprise so the are always using older versions
>
> Our servers:
> System version: Ubuntu 14.04
> Our gluster client version: 3.6.2
>
> The exact problem is that it often happens(couple times a week) that
> errors in gluster causes proceses to become zombies. It happens
> with our
> application server(unicorn), nginx and our crawling script that
> is run
> as daemon.
>
> Our fstab file:
>
> 10.10.11.17:/drslk-prod /mnt/storage glusterfs
> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
> 10.10.11.17:/drslk-backup /mnt/backup glusterfs
> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>
> Logs from gluster:
>
> 2015-02-18 12:36:12.375695] E
> [rpc-clnt.c:362:saved_frames___unwind] (-->
> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
> (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
> bc1c7e] (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
> (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
> _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
> 0-drslk-prod-client-10: forced
> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
> 2015-02-18
> 12:36:12.361489 (xid=0x5d475da)
> [2015-02-18 12:36:12.375765] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> /system/posts/00/00/71/77/59.__jpg
> (2ad81c2b-a141-478d-9dd4-__253345edbce
> b)
> [2015-02-18 12:36:12.376288] E
> [rpc-clnt.c:362:saved_frames___unwind] (-->
> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
> (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
> bc1c7e] (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
> (-->
> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
> _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
> 0-drslk-prod-client-10: forced
> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
> 2015-02-18
> 12:36:12.361858 (xid=0x5d475db)
> [2015-02-18 12:36:12.376355] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
> [2015-02-18 12:36:12.376711] I
> [socket.c:3292:socket_submit___request]
> 0-drslk-prod-client-10: not connected (priv->connected = 0)
> [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___submit]
> 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
> (drslk-prod-client-10)
> [2015-02-18 12:36:12.376814] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-__000000000000)
> [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___notify]
> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
> Client
> process will keep trying to connect to glusterd until brick's
> port is
> available
> [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___submit]
> 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
> (drslk-prod-client-10)
> [2015-02-18 12:36:12.376906] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-__000000000000)
> [2015-02-18 12:36:12.376931] E
> [socket.c:2267:socket_connect___finish]
> 0-drslk-prod-client-10: connection to 10.10.11.23:24007
> <http://10.10.11.23:24007>
> <http://10.10.11.23:24007/> failed (Connection refused)
>
> [2015-02-18 12:36:12.379296] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-__000000000000)
> [2015-02-18 12:36:12.379700] W
> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
> 0-drslk-prod-client-10:
> remote operation failed: Transport endpoint is not connected. Path:
> (null) (00000000-0000-0000-0000-__000000000000)
> [2015-02-18 13:10:52.759736] E
> [client-handshake.c:1496:__client_query_portmap_cbk]
> 0-drslk-prod-client-10: failed to get the port number for remote
> subvolume. Please run 'gluster volume status' on server to see
> if brick
> process is running.
> [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___notify]
> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
> Client
> process will keep trying to connect to glusterd until brick's
> port is
> available
> [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___reconfig]
> 0-drslk-prod-client-10: changing port to 49349 (from 0)
> [2015-02-18 13:11:02.898097] I
> [client-handshake.c:1413:__select_server_supported___programs]
> 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
> [2015-02-18 13:11:02.898446] I
> [client-handshake.c:1200:__client_setvolume_cbk]
> 0-drslk-prod-client-10:
> Connected to drslk-prod-client-10, attached to remote volume
> '/GLUSTERFS/drslk-prod'.
> [2015-02-18 13:11:02.898460] I
> [client-handshake.c:1210:__client_setvolume_cbk]
> 0-drslk-prod-client-10:
> Server and Client lk-version numbers are not same, reopening the fds
>
>
> Can you provide the gluster volume configuration details?
>
> It does look like frame-timeout for the volume has been set to 60.
> Is there any specific reason? Normally altering the frame-timeout is
> not recommended.
>
> -Vijay
>
>
More information about the Gluster-users
mailing list