[Gluster-Maintainers] [Gluster-devel] Master branch lock down: RCA for tests (remove-brick-testcases.t) (./tests/basic/tier/tier-heald.t)

Mon Aug 13 19:32:28 UTC 2018

On 08/13/2018 05:40 AM, Ravishankar N wrote:
> 
> 
> On 08/13/2018 06:12 AM, Shyam Ranganathan wrote:
>> As a means of keeping the focus going and squashing the remaining tests
>> that were failing sporadically, request each test/component owner to,
>>
>> - respond to this mail changing the subject (testname.t) to the test
>> name that they are responding to (adding more than one in case they have
>> the same RCA)
>> - with the current RCA and status of the same
>>
>> List of tests and current owners as per the spreadsheet that we were
>> tracking are:
>>
>> TBD
>>
>> ./tests/bugs/glusterd/remove-brick-testcases.t        TBD
> In this case, the .t passed but self-heal-daemon (which btw does not
> have any role in this test because there is no I/O or heals in this .t)
> has crashed with the following bt:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007ff8c6bc0b4f in _IO_cleanup () from ./lib64/libc.so.6
> [Current thread is 1 (LWP 17530)]
> (gdb)
> (gdb) bt
> #0  0x00007ff8c6bc0b4f in _IO_cleanup () from ./lib64/libc.so.6
> #1  0x00007ff8c6b7cb8b in __run_exit_handlers () from ./lib64/libc.so.6
> #2  0x00007ff8c6b7cc27 in exit () from ./lib64/libc.so.6
> #3  0x000000000040b14d in cleanup_and_exit (signum=15) at glusterfsd.c:1570
> #4  0x000000000040de71 in glusterfs_sigwaiter (arg=0x7ffd5f270d20) at
> glusterfsd.c:2332
> #5  0x00007ff8c757ce25 in start_thread () from ./lib64/libpthread.so.0
> #6  0x00007ff8c6c41bad in clone () from ./lib64/libc.so.6

Slightly better stack with libc symbols as well,

Program terminated with signal 11, Segmentation fault.
#0  0x00007ff8c6bc0b4f in _IO_unbuffer_write () at genops.c:965
965		    if (fp->_lock == NULL || _IO_lock_trylock (*fp->_lock) == 0)
(gdb) bt
#0  0x00007ff8c6bc0b4f in _IO_unbuffer_write () at genops.c:965
#1  _IO_cleanup () at genops.c:1025
#2  0x00007ff8c6b7cb8b in __run_exit_handlers (status=15,
listp=<optimized out>, run_list_atexit=run_list_atexit at entry=true) at
exit.c:90
#3  0x00007ff8c6b7cc27 in __GI_exit (status=<optimized out>) at exit.c:99
#4  0x000000000040b14d in cleanup_and_exit (signum=15) at glusterfsd.c:1570
#5  0x000000000040de71 in glusterfs_sigwaiter (arg=0x7ffd5f270d20) at
glusterfsd.c:2332
#6  0x00007ff8c757ce25 in start_thread (arg=0x7ff8bf318700) at
pthread_create.c:308
#7  0x00007ff8c6c41bad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

> 
> Not able to find out the reason of the crash. Any pointers are
> appreciated. Regression run/core can be found at
> https://build.gluster.org/job/line-coverage/432/consoleFull .

FWIW ./tests/basic/tier/tier-heald.t also core dumped here.
Run:
https://build.gluster.org/job/regression-on-demand-multiplex/237/consoleFull

Stack of thread that hit the segmentation fault:
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe4c60d3b4f in _IO_unbuffer_write () at genops.c:965
965		    if (fp->_lock == NULL || _IO_lock_trylock (*fp->_lock) == 0)
(gdb) bt
#0  0x00007fe4c60d3b4f in _IO_unbuffer_write () at genops.c:965
#1  _IO_cleanup () at genops.c:1025
#2  0x00007fe4c608fb8b in __run_exit_handlers (status=15,
listp=<optimized out>, run_list_atexit=run_list_atexit at entry=true) at
exit.c:90
#3  0x00007fe4c608fc27 in __GI_exit (status=<optimized out>) at exit.c:99
#4  0x0000000000408bf5 in cleanup_and_exit (signum=15) at
/home/jenkins/root/workspace/regression-on-demand-multiplex/glusterfsd/src/glusterfsd.c:1570
#5  0x000000000040a7af in glusterfs_sigwaiter (arg=0x7ffebf1439a0) at
/home/jenkins/root/workspace/regression-on-demand-multiplex/glusterfsd/src/glusterfsd.c:2332
#6  0x00007fe4c6a8fe25 in start_thread (arg=0x7fe4be82b700) at
pthread_create.c:308
#7  0x00007fe4c6154bad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

I do not have any further clues at present, was staring at this last
week for some time, need to dig deeper here. As Ravi points out, any
help appreciated in resolving this.

> 
> Thanks,
> Ravi
>