[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Erik Jacobson erik.jacobson at hpe.com
Fri Apr 17 13:08:27 UTC 2020


Amar, Ravi -

>     This thread has been one of the largest effort to stabilize the systems in
>     recent times.

Well thanks to you guys too. It would have been easy to stop replying
when things got hard. I understand best effort community support and
appreciate you sticking with us.

The test system I had is disappearing on Monday. However, a larger test
system will be less booked after a release finalizes. So I have a
test platform through early next week, and will again have something in a
couple weeks. I also may have a window at 1k nodes at a customer site
during a maintenance window... And we should have a couple big ones
going through the factory in the comings weeks in the 1k size. At 1k
nodes, we have 3 gluster servers.

THANKS AGAIN. Wow what a relief.

Let me get these changes checked in so I can get it to some customers and
then look at getting a new thread going on the thread hangs.

Erik


> 
>     Thanks for patience and number of retries you did, Erik!
> 
> Thanks indeed! Once https://review.gluster.org/#/c/glusterfs/+/24316/ gets
> merged on master, I will back port it to the release branches.
> 
> 
>     We surely need to get to the glitch you found with the 7.4 version, as with
>     every higher version, we expect more stability!
> 
> True, maybe we should start a separate thread...
> 
> Regards,
> Ravi
> 
>     Regards,
>     Amar
> 
>     On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobson <erik.jacobson at hpe.com>
>     wrote:
> 
>         I have some news.
> 
>         After many many many trials, reboots of gluster servers, reboots of
>         nodes...
>         in what should have reproduced the issue several times. I think we're
>         stable.
> 
>         It appears this glusterfs nfs daemon hang only happens in glusterfs74
>         and not 72.
> 
>         So....
>         1) Your split-brain patch
>         2) performance.parallel-readdir off
>         3) glusterfs72
> 
>         I declare it stable. I can't make it fail: split-brain, hang, noor seg
>         fault
>         with one leader down.
> 
>         I'm working on putting this in to a SW update.
> 
>         We are going to test if performance.parallel-readdir off impacts
>         booting
>         at scale but we don't have a system to try it on at this time.
> 
>         THAK YOU!
> 
>         I may have access to the 57 node test system if there is something
>         you'd
>         like me to try with regards to why glusterfs74 is unstable in this
>         situation. Just let me know.
> 
>         Erik
> 
>         On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote:
>         > So in my test runs since making that change, we have a different odd
>         > behavior now. As you recall, this is with your patch -- still not
>         > split-brain -- and now with performance.parallel-readdir off
>         >
>         > The NFS server grinds to a hault after a few test runs. It does not
>         core
>         > dump.
>         >
>         > All that shows up in the log is:
>         >
>         > "pending frames:" with nothing after it and no date stamp.
>         >
>         > I will start looking for interesting break points I guess.
>         >
>         >
>         > The glusterfs for nfs is still alive:
>         >
>         > root     30541     1 42 09:57 ?        00:51:06 /usr/sbin/glusterfs
>         -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid
>         -l /var/log/glusterfs/nfs.log -S /var/run/gluster/
>         9ddb5561058ff543.socket
>         >
>         >
>         >
>         > [root at leader3 ~]# strace -f  -p 30541
>         > strace: Process 30541 attached with 40 threads
>         > [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30550] restart_syscall(<... resuming interrupted restart_syscall
>         ...> <unfinished ...>
>         > [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=243775}
>         <unfinished ...>
>         > [pid 30546] restart_syscall(<... resuming interrupted restart_syscall
>         ...> <unfinished ...>
>         > [pid 30545] restart_syscall(<... resuming interrupted restart_syscall
>         ...> <unfinished ...>
>         > [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM],  <unfinished
>         ...>
>         > [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL
>         <unfinished ...>
>         > [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL <unfinished
>         ...>
>         > [pid 30547] <... select resumed> )      = 0 (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0
>         (Timeout)
>         > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}^
>         Cstrace: Process 30541 detached
>         > strace: Process 30542 detached
>         > strace: Process 30543 detached
>         > strace: Process 30544 detached
>         > strace: Process 30545 detached
>         > strace: Process 30546 detached
>         > strace: Process 30547 detached
>         >  <detached ...>
>         > strace: Process 30548 detached
>         > strace: Process 30549 detached
>         > strace: Process 30550 detached
>         > strace: Process 30551 detached
>         > strace: Process 30552 detached
>         > strace: Process 30553 detached
>         > strace: Process 30554 detached
>         > strace: Process 30555 detached
>         > strace: Process 30556 detached
>         > strace: Process 30557 detached
>         > strace: Process 30558 detached
>         > strace: Process 30559 detached
>         > strace: Process 30560 detached
>         > strace: Process 30561 detached
>         > strace: Process 30562 detached
>         > strace: Process 30563 detached
>         > strace: Process 30564 detached
>         > strace: Process 30565 detached
>         > strace: Process 30566 detached
>         > strace: Process 30567 detached
>         > strace: Process 30568 detached
>         > strace: Process 30569 detached
>         > strace: Process 30570 detached
>         > strace: Process 30571 detached
>         > strace: Process 30572 detached
>         > strace: Process 30573 detached
>         > strace: Process 30574 detached
>         > strace: Process 30575 detached
>         > strace: Process 30576 detached
>         > strace: Process 30577 detached
>         > strace: Process 30578 detached
>         > strace: Process 30579 detached
>         > strace: Process 30580 detached
>         >
>         >
>         >
>         >
>         > > On 16/04/20 8:04 pm, Erik Jacobson wrote:
>         > > > Quick update just on how this got set.
>         > > >
>         > > > gluster volume set cm_shared performance.parallel-readdir on
>         > > >
>         > > > Is something we did turn on, thinking it might make our NFS
>         services
>         > > > faster and not knowing about it possibly being negative.
>         > > >
>         > > > Below is a diff of the nfs volume file ON vs OFF. So I will
>         simply turn
>         > > > this OFF and do a test run.
>         > > Yes,that should do it. I am not sure if
>         performance.parallel-readdir was
>         > > intentionally made to have an effect on gnfs volfiles. Usually, for
>         other
>         > > performance xlators, `gluster volume set` only changes the fuse
>         volfile.
> 
> 
>         ________
> 
> 
> 
>         Community Meeting Calendar:
> 
>         Schedule -
>         Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>         Bridge: https://bluejeans.com/441850968
> 
>         Gluster-users mailing list
>         Gluster-users at gluster.org
>         https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
>     --
>     --
>     https://kadalu.io
>     Container Storage made easy!
> 
> 


Erik Jacobson
Software Engineer

erik.jacobson at hpe.com
+1 612 851 0550 Office

Eagan, MN
hpe.com


More information about the Gluster-users mailing list