[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

Przemysław Mroczek przemek at durszlak.pl
Tue Mar 10 09:03:28 UTC 2015


The versions were:
gluster client: 3.6.2
gluster server: 3.6.0

2015-03-08 18:17 GMT+01:00 Vijay Bellur <vbellur at redhat.com>:

> On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
>
>> I don't have volfiles, they are not on our machines as I said previously
>> we don't have impact on gluster servers.
>>
>> I saw some graph that looks similiar to volume file on logs. I will
>> paste it here but we don't really have any impact on that. We are just
>> using client to connect to gluster servers, we are not in control of.
>>
>>
> I would recommend to not alter the default for frame timeout.
>
>
>> Btw, do you think that different versions of gluster client and gluster
>> server could be an issue here?
>>
>>
> It can potentially be. What versions are you using on the servers and the
> client?
>
> -Vijay
>
>  2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur at redhat.com
>> <mailto:vbellur at redhat.com>>:
>>
>>
>>     On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>>
>>         Hi guys,
>>
>>         We have rails app, which is using gluster for our distributed file
>>         system. The glusters servers are hosted independently as part of
>>         deal
>>         with other, we don't have any impact on them, we are connected o
>>         them by
>>         using gluster native client.
>>
>>         We tried to resolve this issue using help from the admins of the
>>         company
>>         that is hosting our gluster servers, but they say that's the
>> client
>>         issue and we ran out of ideas how that's possible if we are not
>>         doing
>>         anything special here.
>>
>>         Information about independent gluster servers:
>>         -version: 3.6.0.42.1
>>         - They are using red hat
>>         -They are enterprise so the are always using older versions
>>
>>         Our servers:
>>         System version: Ubuntu 14.04
>>         Our gluster client version: 3.6.2
>>
>>         The exact problem is that it often happens(couple times a week)
>> that
>>         errors in gluster causes proceses to become zombies. It happens
>>         with our
>>         application server(unicorn), nginx and our crawling script that
>>         is run
>>         as daemon.
>>
>>         Our fstab file:
>>
>>         10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
>>         defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>>         10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
>>         defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>>
>>         Logs from gluster:
>>
>>         2015-02-18 12:36:12.375695] E
>>         [rpc-clnt.c:362:saved_frames___unwind] (-->
>>         /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>>         (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>>         bc1c7e] (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>>         (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>>         (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>>         _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>>         0-drslk-prod-client-10: forced
>>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>>         2015-02-18
>>         12:36:12.361489 (xid=0x5d475da)
>>         [2015-02-18 12:36:12.375765] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         /system/posts/00/00/71/77/59.__jpg
>>         (2ad81c2b-a141-478d-9dd4-__253345edbce
>>         b)
>>         [2015-02-18 12:36:12.376288] E
>>         [rpc-clnt.c:362:saved_frames___unwind] (-->
>>         /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>>         (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>>         bc1c7e] (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>>         (-->
>>         /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>>         (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>>         _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>>         0-drslk-prod-client-10: forced
>>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>>         2015-02-18
>>         12:36:12.361858 (xid=0x5d475db)
>>         [2015-02-18 12:36:12.376355] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
>>         [2015-02-18 12:36:12.376711] I
>>         [socket.c:3292:socket_submit___request]
>>         0-drslk-prod-client-10: not connected (priv->connected = 0)
>>         [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___
>> submit]
>>         0-drslk-prod-client-10: failed to submit rpc-request (XID:
>> 0x5d475dc
>>         Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>>         (drslk-prod-client-10)
>>         [2015-02-18 12:36:12.376814] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         (null) (00000000-0000-0000-0000-__000000000000)
>>         [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___
>> notify]
>>         0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>>         Client
>>         process will keep trying to connect to glusterd until brick's
>>         port is
>>         available
>>         [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___
>> submit]
>>         0-drslk-prod-client-10: failed to submit rpc-request (XID:
>> 0x5d475dd
>>         Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>>         (drslk-prod-client-10)
>>         [2015-02-18 12:36:12.376906] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         (null) (00000000-0000-0000-0000-__000000000000)
>>         [2015-02-18 12:36:12.376931] E
>>         [socket.c:2267:socket_connect___finish]
>>         0-drslk-prod-client-10: connection to 10.10.11.23:24007
>>         <http://10.10.11.23:24007>
>>         <http://10.10.11.23:24007/> failed (Connection refused)
>>
>>         [2015-02-18 12:36:12.379296] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         (null) (00000000-0000-0000-0000-__000000000000)
>>         [2015-02-18 12:36:12.379700] W
>>         [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>>         0-drslk-prod-client-10:
>>         remote operation failed: Transport endpoint is not connected.
>> Path:
>>         (null) (00000000-0000-0000-0000-__000000000000)
>>         [2015-02-18 13:10:52.759736] E
>>         [client-handshake.c:1496:__client_query_portmap_cbk]
>>         0-drslk-prod-client-10: failed to get the port number for remote
>>         subvolume. Please run 'gluster volume status' on server to see
>>         if brick
>>         process is running.
>>         [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___
>> notify]
>>         0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>>         Client
>>         process will keep trying to connect to glusterd until brick's
>>         port is
>>         available
>>         [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___
>> reconfig]
>>         0-drslk-prod-client-10: changing port to 49349 (from 0)
>>         [2015-02-18 13:11:02.898097] I
>>         [client-handshake.c:1413:__select_server_supported___programs]
>>         0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num
>> (1298437),
>>         Version (330)
>>         [2015-02-18 13:11:02.898446] I
>>         [client-handshake.c:1200:__client_setvolume_cbk]
>>         0-drslk-prod-client-10:
>>         Connected to drslk-prod-client-10, attached to remote volume
>>         '/GLUSTERFS/drslk-prod'.
>>         [2015-02-18 13:11:02.898460] I
>>         [client-handshake.c:1210:__client_setvolume_cbk]
>>         0-drslk-prod-client-10:
>>         Server and Client lk-version numbers are not same, reopening the
>> fds
>>
>>
>>     Can you provide the gluster volume configuration details?
>>
>>     It does look like frame-timeout for the volume has been set to 60.
>>     Is there any specific reason? Normally altering the frame-timeout is
>>     not recommended.
>>
>>     -Vijay
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150310/5fd10368/attachment.html>


More information about the Gluster-users mailing list