[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Ravishankar N ravishankar at redhat.com
Fri Apr 17 05:15:40 UTC 2020


On 17/04/20 10:35 am, Amar Tumballi wrote:
> This thread has been one of the largest effort to stabilize the 
> systems in recent times.
>
> Thanks for patience and number of retries you did, Erik!
Thanks indeed! Once https://review.gluster.org/#/c/glusterfs/+/24316/ 
gets merged on master, I will back port it to the release branches.
>
> We surely need to get to the glitch you found with the 7.4 version, as 
> with every higher version, we expect more stability!

True, maybe we should start a separate thread...

Regards,
Ravi
> Regards,
> Amar
>
> On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobson <erik.jacobson at hpe.com 
> <mailto:erik.jacobson at hpe.com>> wrote:
>
>     I have some news.
>
>     After many many many trials, reboots of gluster servers, reboots
>     of nodes...
>     in what should have reproduced the issue several times. I think we're
>     stable.
>
>     It appears this glusterfs nfs daemon hang only happens in glusterfs74
>     and not 72.
>
>     So....
>     1) Your split-brain patch
>     2) performance.parallel-readdir off
>     3) glusterfs72
>
>     I declare it stable. I can't make it fail: split-brain, hang, noor
>     seg fault
>     with one leader down.
>
>     I'm working on putting this in to a SW update.
>
>     We are going to test if performance.parallel-readdir off impacts
>     booting
>     at scale but we don't have a system to try it on at this time.
>
>     THAK YOU!
>
>     I may have access to the 57 node test system if there is something
>     you'd
>     like me to try with regards to why glusterfs74 is unstable in this
>     situation. Just let me know.
>
>     Erik
>
>     On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote:
>     > So in my test runs since making that change, we have a different odd
>     > behavior now. As you recall, this is with your patch -- still not
>     > split-brain -- and now with performance.parallel-readdir off
>     >
>     > The NFS server grinds to a hault after a few test runs. It does
>     not core
>     > dump.
>     >
>     > All that shows up in the log is:
>     >
>     > "pending frames:" with nothing after it and no date stamp.
>     >
>     > I will start looking for interesting break points I guess.
>     >
>     >
>     > The glusterfs for nfs is still alive:
>     >
>     > root     30541     1 42 09:57 ?        00:51:06
>     /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
>     /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S
>     /var/run/gluster/9ddb5561058ff543.socket
>     >
>     >
>     >
>     > [root at leader3 ~]# strace -f  -p 30541
>     > strace: Process 30541 attached with 40 threads
>     > [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30550] restart_syscall(<... resuming interrupted
>     restart_syscall ...> <unfinished ...>
>     > [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0,
>     tv_usec=243775} <unfinished ...>
>     > [pid 30546] restart_syscall(<... resuming interrupted
>     restart_syscall ...> <unfinished ...>
>     > [pid 30545] restart_syscall(<... resuming interrupted
>     restart_syscall ...> <unfinished ...>
>     > [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM],
>     <unfinished ...>
>     > [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>     <unfinished ...>
>     > [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL
>     <unfinished ...>
>     > [pid 30547] <... select resumed> )      = 0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) =
>     0 (Timeout)
>     > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1,
>     tv_usec=0}^Cstrace: Process 30541 detached
>     > strace: Process 30542 detached
>     > strace: Process 30543 detached
>     > strace: Process 30544 detached
>     > strace: Process 30545 detached
>     > strace: Process 30546 detached
>     > strace: Process 30547 detached
>     >  <detached ...>
>     > strace: Process 30548 detached
>     > strace: Process 30549 detached
>     > strace: Process 30550 detached
>     > strace: Process 30551 detached
>     > strace: Process 30552 detached
>     > strace: Process 30553 detached
>     > strace: Process 30554 detached
>     > strace: Process 30555 detached
>     > strace: Process 30556 detached
>     > strace: Process 30557 detached
>     > strace: Process 30558 detached
>     > strace: Process 30559 detached
>     > strace: Process 30560 detached
>     > strace: Process 30561 detached
>     > strace: Process 30562 detached
>     > strace: Process 30563 detached
>     > strace: Process 30564 detached
>     > strace: Process 30565 detached
>     > strace: Process 30566 detached
>     > strace: Process 30567 detached
>     > strace: Process 30568 detached
>     > strace: Process 30569 detached
>     > strace: Process 30570 detached
>     > strace: Process 30571 detached
>     > strace: Process 30572 detached
>     > strace: Process 30573 detached
>     > strace: Process 30574 detached
>     > strace: Process 30575 detached
>     > strace: Process 30576 detached
>     > strace: Process 30577 detached
>     > strace: Process 30578 detached
>     > strace: Process 30579 detached
>     > strace: Process 30580 detached
>     >
>     >
>     >
>     >
>     > > On 16/04/20 8:04 pm, Erik Jacobson wrote:
>     > > > Quick update just on how this got set.
>     > > >
>     > > > gluster volume set cm_shared performance.parallel-readdir on
>     > > >
>     > > > Is something we did turn on, thinking it might make our NFS
>     services
>     > > > faster and not knowing about it possibly being negative.
>     > > >
>     > > > Below is a diff of the nfs volume file ON vs OFF. So I will
>     simply turn
>     > > > this OFF and do a test run.
>     > > Yes,that should do it. I am not sure if
>     performance.parallel-readdir was
>     > > intentionally made to have an effect on gnfs volfiles.
>     Usually, for other
>     > > performance xlators, `gluster volume set` only changes the
>     fuse volfile.
>
>
>     ________
>
>
>
>     Community Meeting Calendar:
>
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://bluejeans.com/441850968
>
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> -- 
> -- 
> https://kadalu.io
> Container Storage made easy!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200417/edc21f19/attachment.html>


More information about the Gluster-users mailing list