[Gluster-devel] Server hanged and dropped out the connections of all clients

Ioannis Aslanidis iaslanidis at flumotion.com
Tue Feb 10 11:26:42 UTC 2009


Hello,

I had 1 server and 20 client machines mounting a glusterfs partition.
After several weeks working correctly, the server stopped responding for
all clients. Trying to list the contents of the intended mounted
directory hangs/blocks the application (such as a simple `ls`).

Restarting the server made all clients automatically reconnect, which
makes me think of a server failure, somehow.

The thing is that the server logs report nothing at all:

2009-02-06 12:17:38 E [server-protocol.c:184:generic_reply] server:
transport_writev failed
2009-02-06 12:22:23 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.105:1023)
----- my restart at 2009-02-10 12:01 -----
2009-02-10 12:01:42 W [glusterfs.c:417:glusterfs_cleanup_and_exit]
glusterfs: shutting down server
2009-02-10 12:01:47 E [server-protocol.c:5190:mop_getspec] server:
Unable to open /etc/glusterfs/glusterfs-client.vol.192.168.128.101 (No
such file or directory)
2009-02-10 12:01:47 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.101:1023)
2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.103:1023)
2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.202:1023)
2009-02-10 12:05:30 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.108:1023)


There is absolutely nothing before my restart; however, in the logs of
the clients I did find something:

2009-02-10 12:00:26 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465570: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(35) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465577: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465578: (34) /cust => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465579: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465581: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465570: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465577: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465578: (34) /cust => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465579: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465581: (34) / => -1 (107)
2009-02-10 12:05:30 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 12:05:30 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60938
2009-02-10 12:05:30 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:05:30 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465585: (34) /cust => -1 (107)
2009-02-10 12:05:30 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed


Another client log:

2009-02-10 11:59:14 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0
2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101046: (34) / => -1 (107)
2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0
2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101050: (34) / => -1 (107)
2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101046: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101050: (34) / => -1 (107)


Server version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:34:21
Client version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:31:19

Another piece of interesting information is that there were about 50
simultaneous connections from each client (making up to a total of 1000
connections) to the server.

Has anyone experienced anything similar before? Is there any fix for this?

If you require any additional information, please do no hesitate to ask
for it.

Regards,

Ioannis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iaslanidis.vcf
Type: text/x-vcard
Size: 360 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090210/65e4c963/attachment-0003.vcf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090210/65e4c963/attachment-0003.sig>


More information about the Gluster-devel mailing list