[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]
Przemysław Mroczek
przemek at durszlak.pl
Tue Mar 10 09:03:28 UTC 2015
The versions were:
gluster client: 3.6.2
gluster server: 3.6.0
2015-03-08 18:17 GMT+01:00 Vijay Bellur <vbellur at redhat.com>:
> On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
>
>> I don't have volfiles, they are not on our machines as I said previously
>> we don't have impact on gluster servers.
>>
>> I saw some graph that looks similiar to volume file on logs. I will
>> paste it here but we don't really have any impact on that. We are just
>> using client to connect to gluster servers, we are not in control of.
>>
>>
> I would recommend to not alter the default for frame timeout.
>
>
>> Btw, do you think that different versions of gluster client and gluster
>> server could be an issue here?
>>
>>
> It can potentially be. What versions are you using on the servers and the
> client?
>
> -Vijay
>
> 2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur at redhat.com
>> <mailto:vbellur at redhat.com>>:
>>
>>
>> On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>>
>> Hi guys,
>>
>> We have rails app, which is using gluster for our distributed file
>> system. The glusters servers are hosted independently as part of
>> deal
>> with other, we don't have any impact on them, we are connected o
>> them by
>> using gluster native client.
>>
>> We tried to resolve this issue using help from the admins of the
>> company
>> that is hosting our gluster servers, but they say that's the
>> client
>> issue and we ran out of ideas how that's possible if we are not
>> doing
>> anything special here.
>>
>> Information about independent gluster servers:
>> -version: 3.6.0.42.1
>> - They are using red hat
>> -They are enterprise so the are always using older versions
>>
>> Our servers:
>> System version: Ubuntu 14.04
>> Our gluster client version: 3.6.2
>>
>> The exact problem is that it often happens(couple times a week)
>> that
>> errors in gluster causes proceses to become zombies. It happens
>> with our
>> application server(unicorn), nginx and our crawling script that
>> is run
>> as daemon.
>>
>> Our fstab file:
>>
>> 10.10.11.17:/drslk-prod /mnt/storage glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>> 10.10.11.17:/drslk-backup /mnt/backup glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>>
>> Logs from gluster:
>>
>> 2015-02-18 12:36:12.375695] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>> 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>> 2015-02-18
>> 12:36:12.361489 (xid=0x5d475da)
>> [2015-02-18 12:36:12.375765] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> /system/posts/00/00/71/77/59.__jpg
>> (2ad81c2b-a141-478d-9dd4-__253345edbce
>> b)
>> [2015-02-18 12:36:12.376288] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
>> 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>> 2015-02-18
>> 12:36:12.361858 (xid=0x5d475db)
>> [2015-02-18 12:36:12.376355] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
>> [2015-02-18 12:36:12.376711] I
>> [socket.c:3292:socket_submit___request]
>> 0-drslk-prod-client-10: not connected (priv->connected = 0)
>> [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___
>> submit]
>> 0-drslk-prod-client-10: failed to submit rpc-request (XID:
>> 0x5d475dc
>> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>> (drslk-prod-client-10)
>> [2015-02-18 12:36:12.376814] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> (null) (00000000-0000-0000-0000-__000000000000)
>> [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___
>> notify]
>> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>> Client
>> process will keep trying to connect to glusterd until brick's
>> port is
>> available
>> [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___
>> submit]
>> 0-drslk-prod-client-10: failed to submit rpc-request (XID:
>> 0x5d475dd
>> Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
>> (drslk-prod-client-10)
>> [2015-02-18 12:36:12.376906] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> (null) (00000000-0000-0000-0000-__000000000000)
>> [2015-02-18 12:36:12.376931] E
>> [socket.c:2267:socket_connect___finish]
>> 0-drslk-prod-client-10: connection to 10.10.11.23:24007
>> <http://10.10.11.23:24007>
>> <http://10.10.11.23:24007/> failed (Connection refused)
>>
>> [2015-02-18 12:36:12.379296] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> (null) (00000000-0000-0000-0000-__000000000000)
>> [2015-02-18 12:36:12.379700] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> (null) (00000000-0000-0000-0000-__000000000000)
>> [2015-02-18 13:10:52.759736] E
>> [client-handshake.c:1496:__client_query_portmap_cbk]
>> 0-drslk-prod-client-10: failed to get the port number for remote
>> subvolume. Please run 'gluster volume status' on server to see
>> if brick
>> process is running.
>> [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___
>> notify]
>> 0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
>> Client
>> process will keep trying to connect to glusterd until brick's
>> port is
>> available
>> [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___
>> reconfig]
>> 0-drslk-prod-client-10: changing port to 49349 (from 0)
>> [2015-02-18 13:11:02.898097] I
>> [client-handshake.c:1413:__select_server_supported___programs]
>> 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num
>> (1298437),
>> Version (330)
>> [2015-02-18 13:11:02.898446] I
>> [client-handshake.c:1200:__client_setvolume_cbk]
>> 0-drslk-prod-client-10:
>> Connected to drslk-prod-client-10, attached to remote volume
>> '/GLUSTERFS/drslk-prod'.
>> [2015-02-18 13:11:02.898460] I
>> [client-handshake.c:1210:__client_setvolume_cbk]
>> 0-drslk-prod-client-10:
>> Server and Client lk-version numbers are not same, reopening the
>> fds
>>
>>
>> Can you provide the gluster volume configuration details?
>>
>> It does look like frame-timeout for the volume has been set to 60.
>> Is there any specific reason? Normally altering the frame-timeout is
>> not recommended.
>>
>> -Vijay
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150310/5fd10368/attachment.html>
More information about the Gluster-users
mailing list