[Gluster-devel] NFS reexport works, still stat-prefetch issues, -s problem

Fri May 11 02:01:18 UTC 2007

On Thu, 10 May 2007, Brent A Nelson wrote:

> [May 10 18:14:18] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=115)
> [May 10 18:14:18] [CRITICAL/tcp.c:81/tcp_disconnect()] 
> transport/tcp:share4-1: connection to server disconnected
> [May 10 18:14:18] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 10 18:14:18] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> [May 10 18:14:18] [CRITICAL/tcp.c:81/tcp_disconnect()] 
> transport/tcp:share4-0: connection to server disconnected
> [May 10 18:14:18] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 10 18:14:18] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 10 18:14:19] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 10 18:14:19] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=115)
> [May 10 18:14:19] [CRITICAL/tcp.c:81/tcp_disconnect()] 
> transport/tcp:share4-0: connection to server disconnected
> [May 10 18:14:19] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
>
> I've seen the "0 bytes r/w instead of 113" message plenty of times in the 
> past (with older GlusterFS versions), although it was apparently harmless 
> before.  It looks like the code now considers this to be a disconnection and 
> tries to reconnect.  For some reason, when it does manage to reconnect, it 
> nevertheless results in an I/O error.  I wonder if this relates to a previous 
> issue I mentioned with real disconnects (node dies or glusterfsd is 
> restarted), where the first access after a failure (at least for ls or df) 
> results in an error, but the next attempt succeeds? Seems like an issue with 
> the reconnection logic (and some sort of glitch masquerading as a disconnect 
> in the first place)... This is probably the real problem that is triggering 
> the read-ahead crash (i.e., the read-ahead crash would not be triggered in my 
> test case if it weren't for this issue).
>

Well, it looks like I can reproduce this behavior (but, so far, not the 
memory leak), on a much simpler setup, no NFS required.  I was copying my 
test area (with several 10GB files) to a really simple GlusterFS (one 
share, no afr, no unify, glusterfsd on the same machine), when I hit the 
disconnect issue (after a few files successfully copied).  This looked 
like an issue with protocol/client and/or protocol/server, but I thought 
it would be a good idea to narrow things down a bit...

Thanks,

Brent