[Gluster-devel] odd connection issues under high write load

Brent A Nelson brent at phys.ufl.edu
Fri Jun 22 03:00:24 UTC 2007


I believe you'll find that this works in the tla repository (2.4; 2.5 is 
significantly different code), which has a few patches beyond pre4.

On Thu, 21 Jun 2007, Daniel wrote:

> 1.30-pre4
> afr across 2 servers
>
> servers are io-streams, no write back no read forward
> TCP on a Gigabit network
>
> We setup a stresstest script to test the client using php and about 36 
> instances of the script, and occasionally we get a "transport end point not 
> connected" which kills all of the instances (intentionally, they halt on 
> error, but it means the mount went stale), but without any intervention 
> gluster picks up again and seems to operate fine when we re-run the scripts
>
> we're pushing roughly 300 writes a second in the test
>
> the only debug info in the log is the following:
>
> [Jun 21 19:33:29] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [Jun 21 19:33:29] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [Jun 21 19:33:29] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=104)
> [Jun 21 19:33:29] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:mortar1: 
> connection to server disconnected
> [Jun 21 19:33:29] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=104)
> [Jun 21 19:33:29] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:mortar2: 
> connection to server disconnected
> [Jun 21 19:33:29] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [Jun 21 19:33:29] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [Jun 21 19:33:29] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [Jun 21 19:33:29] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:mortar2: 
> connection to server disconnected
> [Jun 21 19:33:29] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [Jun 21 19:33:29] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [Jun 21 19:33:29] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=115)
> [Jun 21 19:33:29] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:mortar1: 
> connection to server disconnected
> [Jun 21 19:33:29] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
>
> I'm going to setup the debug xlator tomorrow if no one has anything off the 
> tops of their heads about what might be wrong
>
> we haven't tested heavy read load yet, just writes
> we have managed to cause it multiple times, but haven't pinned down a cause 
> as the debug logging all spits out basically the same material
>
> the client also has fairly high CPU usage during the test, roughly 90% of the 
> core its on
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>





More information about the Gluster-devel mailing list