[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Erik Jacobson erik.jacobson at hpe.com
Thu Apr 16 21:16:23 UTC 2020


I have some news.

After many many many trials, reboots of gluster servers, reboots of nodes...
in what should have reproduced the issue several times. I think we're
stable.

It appears this glusterfs nfs daemon hang only happens in glusterfs74
and not 72.

So....
1) Your split-brain patch
2) performance.parallel-readdir off
3) glusterfs72

I declare it stable. I can't make it fail: split-brain, hang, noor seg fault
with one leader down.

I'm working on putting this in to a SW update.

We are going to test if performance.parallel-readdir off impacts booting
at scale but we don't have a system to try it on at this time.

THAK YOU!

I may have access to the 57 node test system if there is something you'd
like me to try with regards to why glusterfs74 is unstable in this
situation. Just let me know.

Erik

On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote:
> So in my test runs since making that change, we have a different odd
> behavior now. As you recall, this is with your patch -- still not
> split-brain -- and now with performance.parallel-readdir off
> 
> The NFS server grinds to a hault after a few test runs. It does not core
> dump.
> 
> All that shows up in the log is:
> 
> "pending frames:" with nothing after it and no date stamp.
> 
> I will start looking for interesting break points I guess.
> 
> 
> The glusterfs for nfs is still alive:
> 
> root     30541     1 42 09:57 ?        00:51:06 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/9ddb5561058ff543.socket
> 
> 
> 
> [root at leader3 ~]# strace -f  -p 30541
> strace: Process 30541 attached with 40 threads
> [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30550] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
> [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=243775} <unfinished ...>
> [pid 30546] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
> [pid 30545] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
> [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM],  <unfinished ...>
> [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL <unfinished ...>
> [pid 30547] <... select resumed> )      = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}^Cstrace: Process 30541 detached
> strace: Process 30542 detached
> strace: Process 30543 detached
> strace: Process 30544 detached
> strace: Process 30545 detached
> strace: Process 30546 detached
> strace: Process 30547 detached
>  <detached ...>
> strace: Process 30548 detached
> strace: Process 30549 detached
> strace: Process 30550 detached
> strace: Process 30551 detached
> strace: Process 30552 detached
> strace: Process 30553 detached
> strace: Process 30554 detached
> strace: Process 30555 detached
> strace: Process 30556 detached
> strace: Process 30557 detached
> strace: Process 30558 detached
> strace: Process 30559 detached
> strace: Process 30560 detached
> strace: Process 30561 detached
> strace: Process 30562 detached
> strace: Process 30563 detached
> strace: Process 30564 detached
> strace: Process 30565 detached
> strace: Process 30566 detached
> strace: Process 30567 detached
> strace: Process 30568 detached
> strace: Process 30569 detached
> strace: Process 30570 detached
> strace: Process 30571 detached
> strace: Process 30572 detached
> strace: Process 30573 detached
> strace: Process 30574 detached
> strace: Process 30575 detached
> strace: Process 30576 detached
> strace: Process 30577 detached
> strace: Process 30578 detached
> strace: Process 30579 detached
> strace: Process 30580 detached
> 
> 
> 
> 
> > On 16/04/20 8:04 pm, Erik Jacobson wrote:
> > > Quick update just on how this got set.
> > > 
> > > gluster volume set cm_shared performance.parallel-readdir on
> > > 
> > > Is something we did turn on, thinking it might make our NFS services
> > > faster and not knowing about it possibly being negative.
> > > 
> > > Below is a diff of the nfs volume file ON vs OFF. So I will simply turn
> > > this OFF and do a test run.
> > Yes,that should do it. I am not sure if performance.parallel-readdir was
> > intentionally made to have an effect on gnfs volfiles. Usually, for other
> > performance xlators, `gluster volume set` only changes the fuse volfile.




More information about the Gluster-users mailing list