[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

Vijay Bellur vbellur at redhat.com
Sun Mar 8 17:17:06 UTC 2015


On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
> I don't have volfiles, they are not on our machines as I said previously
> we don't have impact on gluster servers.
>
> I saw some graph that looks similiar to volume file on logs. I will
> paste it here but we don't really have any impact on that. We are just
> using client to connect to gluster servers, we are not in control of.
>

I would recommend to not alter the default for frame timeout.

>
> Btw, do you think that different versions of gluster client and gluster
> server could be an issue here?
>

It can potentially be. What versions are you using on the servers and 
the client?

-Vijay

> 2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur at redhat.com
> <mailto:vbellur at redhat.com>>:
>
>     On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>
>         Hi guys,
>
>         We have rails app, which is using gluster for our distributed file
>         system. The glusters servers are hosted independently as part of
>         deal
>         with other, we don't have any impact on them, we are connected o
>         them by
>         using gluster native client.
>
>         We tried to resolve this issue using help from the admins of the
>         company
>         that is hosting our gluster servers, but they say that's the client
>         issue and we ran out of ideas how that's possible if we are not
>         doing
>         anything special here.
>
>         Information about independent gluster servers:
>         -version: 3.6.0.42.1
>         - They are using red hat
>         -They are enterprise so the are always using older versions
>
>         Our servers:
>         System version: Ubuntu 14.04
>         Our gluster client version: 3.6.2
>
>         The exact problem is that it often happens(couple times a week) that
>         errors in gluster causes proceses to become zombies. It happens
>         with our
>         application server(unicorn), nginx and our crawling script that
>         is run
>         as daemon.
>
>         Our fstab file:
>
>         10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
>         defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>         10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
>         defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>
>         Logs from gluster:
>
>         2015-02-18 12:36:12.375695] E
>         [rpc-clnt.c:362:saved_frames___unwind] (-->
>         /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
>         (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
>         bc1c7e] (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
>         (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
>         (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>         _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>         0-drslk-prod-client-10: forced
>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>         2015-02-18
>         12:36:12.361489 (xid=0x5d475da)
>         [2015-02-18 12:36:12.375765] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         /system/posts/00/00/71/77/59.__jpg
>         (2ad81c2b-a141-478d-9dd4-__253345edbce
>         b)
>         [2015-02-18 12:36:12.376288] E
>         [rpc-clnt.c:362:saved_frames___unwind] (-->
>         /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
>         (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
>         bc1c7e] (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
>         (-->
>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
>         (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>         _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>         0-drslk-prod-client-10: forced
>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>         2015-02-18
>         12:36:12.361858 (xid=0x5d475db)
>         [2015-02-18 12:36:12.376355] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
>         [2015-02-18 12:36:12.376711] I
>         [socket.c:3292:socket_submit___request]
>         0-drslk-prod-client-10: not connected (priv->connected = 0)
>         [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___submit]
>         0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
>         Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>         (drslk-prod-client-10)
>         [2015-02-18 12:36:12.376814] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         (null) (00000000-0000-0000-0000-__000000000000)
>         [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___notify]
>         0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>         Client
>         process will keep trying to connect to glusterd until brick's
>         port is
>         available
>         [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___submit]
>         0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
>         Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>         (drslk-prod-client-10)
>         [2015-02-18 12:36:12.376906] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         (null) (00000000-0000-0000-0000-__000000000000)
>         [2015-02-18 12:36:12.376931] E
>         [socket.c:2267:socket_connect___finish]
>         0-drslk-prod-client-10: connection to 10.10.11.23:24007
>         <http://10.10.11.23:24007>
>         <http://10.10.11.23:24007/> failed (Connection refused)
>
>         [2015-02-18 12:36:12.379296] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         (null) (00000000-0000-0000-0000-__000000000000)
>         [2015-02-18 12:36:12.379700] W
>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>         0-drslk-prod-client-10:
>         remote operation failed: Transport endpoint is not connected. Path:
>         (null) (00000000-0000-0000-0000-__000000000000)
>         [2015-02-18 13:10:52.759736] E
>         [client-handshake.c:1496:__client_query_portmap_cbk]
>         0-drslk-prod-client-10: failed to get the port number for remote
>         subvolume. Please run 'gluster volume status' on server to see
>         if brick
>         process is running.
>         [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___notify]
>         0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>         Client
>         process will keep trying to connect to glusterd until brick's
>         port is
>         available
>         [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___reconfig]
>         0-drslk-prod-client-10: changing port to 49349 (from 0)
>         [2015-02-18 13:11:02.898097] I
>         [client-handshake.c:1413:__select_server_supported___programs]
>         0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
>         Version (330)
>         [2015-02-18 13:11:02.898446] I
>         [client-handshake.c:1200:__client_setvolume_cbk]
>         0-drslk-prod-client-10:
>         Connected to drslk-prod-client-10, attached to remote volume
>         '/GLUSTERFS/drslk-prod'.
>         [2015-02-18 13:11:02.898460] I
>         [client-handshake.c:1210:__client_setvolume_cbk]
>         0-drslk-prod-client-10:
>         Server and Client lk-version numbers are not same, reopening the fds
>
>
>     Can you provide the gluster volume configuration details?
>
>     It does look like frame-timeout for the volume has been set to 60.
>     Is there any specific reason? Normally altering the frame-timeout is
>     not recommended.
>
>     -Vijay
>
>



More information about the Gluster-users mailing list