[Gluster-devel] odd connection issues under high write load (now read load)

Sat Jun 23 00:48:55 UTC 2007

Daniel,
 is it possible for you to get a gdb backtrace of the core dump?
thanks!
avati

2007/6/23, Daniel <daniel at datinggold.com>:
>
> same setup, rebuilt with 2.4 patch 181, installed on the 2 servers and 1
> client
>
> had much better performance but now we fail hard instead of soft, the
> mount flat out dies and debug throws ugly backtrace messages
> our writes operated great, no fails, no stale mounts, worked great, but
> the read stress test caused horrible horrible mount death that crashes
> the glusterfs client
>
>
> --this block is repeated--
> [Jun 22 19:23:50] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [Jun 22 19:23:50] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing
> poll/read/write to break on blocked socket (if any)
> --about 400-500 times--
>
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:215/gf_print_trace()]
> debug-backtrace:Got signal (11), printing backtrace
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0(gf_print_trace+0x26)
> [0x6bce1a]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/lib/tls/libc.so.6 [0x2668c8]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/glusterfs/1.3.0-pre3/xlator/cluster/afr.so
> [0x69d4b3]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
>
> debug-backtrace:/usr/local/lib/glusterfs/1.3.0-pre3/xlator/performance/stat-
> prefetch.so
> [0x120999]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0 [0x6bb039]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0 [0x6bb039]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
>
> debug-backtrace:/usr/local/lib/glusterfs/1.3.0-pre3/xlator/protocol/client.so
> [0x118a1c]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
>
> debug-backtrace:/usr/local/lib/glusterfs/1.3.0-pre3/xlator/protocol/client.so
> [0x11adfb]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0(transport_notify+0x13)
> [0x6bdc5f]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0(sys_epoll_iteration+0xcf)
> [0x6be2cb]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/usr/local/lib/libglusterfs.so.0(poll_iteration+0x1b)
> [0x6bddf7]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:[glusterfs] [0x804a317]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:/lib/tls/libc.so.6(__libc_start_main+0xd3) [0x253e23]
> [Jun 22 19:23:50] [CRITICAL/common-utils.c:217/gf_print_trace()]
> debug-backtrace:[glusterfs] [0x8049dfd]
>
> Brent A Nelson wrote:
> > I believe you'll find that this works in the tla repository (2.4; 2.5
> > is significantly different code), which has a few patches beyond pre4.
> >
> > On Thu, 21 Jun 2007, Daniel wrote:
> >
> >> 1.30-pre4
> >> afr across 2 servers
> >>
> >> servers are io-streams, no write back no read forward
> >> TCP on a Gigabit network
> >>
> >> We setup a stresstest script to test the client using php and about
> >> 36 instances of the script, and occasionally we get a "transport end
> >> point not connected" which kills all of the instances (intentionally,
> >> they halt on error, but it means the mount went stale), but without
> >> any intervention gluster picks up again and seems to operate fine
> >> when we re-run the scripts
> >>
> >> we're pushing roughly 300 writes a second in the test
> >>
> >> the only debug info in the log is the following:
> >>
> >> [Jun 21 19:33:29] [CRITICAL/client-protocol.c:218/call_bail()]
> >> client/protocol:bailing transport
> >> --clipped--
> >> [Jun 21 19:33:29]
> >> [ERROR/client-protocol.c:204/client_protocol_xfer()]
> >> protocol/client:transport_submit failed
> >>
> >> I'm going to setup the debug xlator tomorrow if no one has anything
> >> off the tops of their heads about what might be wrong
> >>
> >> we haven't tested heavy read load yet, just writes
> >> we have managed to cause it multiple times, but haven't pinned down a
> >> cause as the debug logging all spits out basically the same material
> >>
> >> the client also has fairly high CPU usage during the test, roughly
> >> 90% of the core its on
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
Anand V. Avati