[Bugs] [Bug 1209461] BVT: glusterd crashed and dumped during upgrade (on rhel7.1 server)

Wed Apr 8 09:55:59 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1209461

Kaushal <kaushal at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|unspecified                 |low

--- Comment #4 from Kaushal <kaushal at redhat.com> ---
The crash is caused due to race between exit and a socket event.
```
(gdb) thr a a bt

Thread 7 (Thread 0x7f5ca95b3700 (LWP 13303)):
#0  0x00007f5cb7823705 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cad5af353 in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:501
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5caf040700 (LWP 13151)):
#0  0x00007f5cab879859 in __do_global_dtors_aux () from /lib64/libselinux.so.1
#1  0x00007f5cb8744b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f5cb70a8e49 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007f5cb70a8e95 in exit () from /lib64/libc.so.6
#4  0x00007f5cb896253a in cleanup_and_exit (signum=<optimized out>) at
glusterfsd.c:1242
#5  0x00007f5cb8962625 in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:1983
#6  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5caf841700 (LWP 13150)):
#0  0x00007f5cb782699d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5cb84c7224 in gf_timer_proc (ctx=0x7f5cba015010) at timer.c:191
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5cae83f700 (LWP 13152)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc at entry=0x7f5cba042a20) at
syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042a20) at
syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5cae03e700 (LWP 13153)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc at entry=0x7f5cba042de0) at
syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042de0) at
syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5cb8946740 (LWP 13149)):
#0  0x00007f5cb7820f27 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5cb8504f25 in event_dispatch_epoll (event_pool=0x7f5cba033c50) at
event-epoll.c:759
#2  0x00007f5cb895f61a in main (argc=4, argv=0x7fff8e7dc2b8) at
glusterfsd.c:2313

Thread 1 (Thread 0x7f5ca8db2700 (LWP 13304)):
#0  0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1
#1  0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#2  0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify
(rpc=rpc at entry=0x7f5cba0a17a0, mydata=mydata at entry=0x7f5cba0964a0,
event=event at entry=RPC_CLNT_CONNECT, data=data at entry=0x0) at
glusterd-handler.c:4681
#3  0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0,
mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0,
notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#4  0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at
rpc-clnt.c:926
#5  0x00007f5cb827a4c3 in rpc_transport_notify (this=this at entry=0x7f5cba0a4930,
event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7f5cba0a4930) at
rpc-transport.c:543
#6  0x00007f5caad75a27 in socket_connect_finish
(this=this at entry=0x7f5cba0a4930) at socket.c:2366
#7  0x00007f5caad7af7f in socket_event_handler (fd=fd at entry=11,
idx=idx at entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at
socket.c:2396
#8  0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0,
event_pool=0x7f5cba033c50) at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674
#10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6
(gdb)
```

As we can observe, thread 6 is in the process of exiting the process. It has
already run the exit handlers, which cleanup things that require cleaning up.
This includes liburcu resources. By the time thread 1 calls rcu_bp_register(),
the liburcu resources have been cleaned up. rcu_bp_register tries to access
these non-existent resources, which leads to the segmentation fault.

Races like this are hard to fix. As this race and crash happen when the process
is almost at the point of stopping, it doesn't have any serious impact to
functionality apart from the core file and the log message. I'm setting a lower
priority for this bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=PL9aPDJFgr&a=cc_unsubscribe