[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

Przemysław Mroczek przemek at durszlak.pl
Sat Mar 7 23:20:38 UTC 2015


Hi guys,

We have rails app, which is using gluster for our distributed file system.
The glusters servers are hosted independently as part of deal with other,
we don't have any impact on them, we are connected o them by using gluster
native client.

We tried to resolve this issue using help from the admins of the company
that is hosting our gluster servers, but they say that's the client issue
and we ran out of ideas how that's possible if we are not doing anything
special here.

Information about independent gluster servers:
-version: 3.6.0.42.1
- They are using red hat
-They are enterprise so the are always using older versions

Our servers:
System version: Ubuntu 14.04
Our gluster client version: 3.6.2

The exact problem is that it often happens(couple times a week) that errors
in gluster causes proceses to become zombies. It happens with our
application server(unicorn), nginx and our crawling script that is run as
daemon.

Our fstab file:

10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0
10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0

Logs from gluster:

2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361489 (xid=0x5d475da)
[2015-02-18 12:36:12.375765] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce
b)
[2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361858 (xid=0x5d475db)
[2015-02-18 12:36:12.376355] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d)
[2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request]
0-drslk-prod-client-10: not connected (priv->connected = 0)
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376814] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376906] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish]
0-drslk-prod-client-10: connection to 10.10.11.23:24007 failed (Connection
refused)
[2015-02-18 12:36:12.379296] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.379700] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)
[2015-02-18 13:10:52.759736] E
[client-handshake.c:1496:client_query_portmap_cbk] 0-drslk-prod-client-10:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig]
0-drslk-prod-client-10: changing port to 49349 (from 0)
[2015-02-18 13:11:02.898097] I
[client-handshake.c:1413:select_server_supported_programs]
0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2015-02-18 13:11:02.898446] I
[client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10:
Connected to drslk-prod-client-10, attached to remote volume
'/GLUSTERFS/drslk-prod'.
[2015-02-18 13:11:02.898460] I
[client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10:
Server and Client lk-version numbers are not same, reopening the fds


Additional logs in attachments.

Did anyone encounter similiar issue with gluster? Do you have any ideas how
to solve the problem?

Best regards,
Przemek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mnt-storage-241.log
Type: text/x-log
Size: 12006 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mnt-storage-242-old.log
Type: text/x-log
Size: 24202 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150308/57e762a7/attachment-0001.bin>


More information about the Gluster-users mailing list