[Bugs] [Bug 1230195] New: BVT: glusterd crashed and dumped during upgrade (on rhel7.1 server)

bugzilla at redhat.com bugzilla at redhat.com
Wed Jun 10 12:22:23 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1230195

            Bug ID: 1230195
           Summary: BVT: glusterd crashed and dumped during upgrade (on
                    rhel7.1 server)
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: glusterfs-server
          Keywords: Triaged
          Severity: medium
          Priority: low
          Assignee: rhs-bugs at redhat.com
          Reporter: anekkunt at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: akhakhar at redhat.com, amukherj at redhat.com,
                    anekkunt at redhat.com, bugs at gluster.org,
                    gluster-bugs at redhat.com, kripper at gmail.com,
                    nlevinki at redhat.com, sasundar at redhat.com,
                    vbellur at redhat.com
        Depends On: 1209461
            Blocks: 1230026
    Target Release: ---



+++ This bug was initially created as a clone of Bug #1209461 +++

Description of problem:
glusterd has crashed and dumped core.
Core was generated by `glusterd --xlator-option *.upgrade=on -N'
backtrace of the core:

(gdb) bt
#0  0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1
#1  0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#2  0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify
(rpc=rpc at entry=0x7f5cba0a17a0, mydata=mydata at entry=0x7f5cba0964a0,
event=event at entry=RPC_CLNT_CONNECT, data=data at entry=0x0) at
glusterd-handler.c:4681
#3  0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0,
mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0,
notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#4  0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at
rpc-clnt.c:926
#5  0x00007f5cb827a4c3 in rpc_transport_notify (this=this at entry=0x7f5cba0a4930,
event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7f5cba0a4930) at
rpc-transport.c:543
#6  0x00007f5caad75a27 in socket_connect_finish
(this=this at entry=0x7f5cba0a4930) at socket.c:2366
#7  0x00007f5caad7af7f in socket_event_handler (fd=fd at entry=11,
idx=idx at entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at
socket.c:2396
#8  0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0,
event_pool=0x7f5cba033c50) at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674
#10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
Upstream glusterfs3.7 on Rhel7.1 server

How reproducible:


Steps to Reproduce:
1. Not a manual process
2. Running BVT on rhel7.1 server with upstream glusterfs3.7 packages
3. Recieved a core from one of the servers, when fssanity tests failed after
watchdog timeout.


Actual results:


Expected results: Server not to crash.


Additional info:

--- Additional comment from Apeksha on 2015-04-07 08:32:24 EDT ---

sosreports and logs can be found in the following location;

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1209461/

--- Additional comment from SATHEESARAN on 2015-04-07 11:29:17 EDT ---

(In reply to Apeksha from comment #1)
> sosreports and logs can be found in the following location;
> 
> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1209461/

Apeksha,

The sosreports doesn't have permission to download.
Could you provide suitable perms to make it accessible ?

--- Additional comment from Apeksha on 2015-04-08 01:23:22 EDT ---

Changed the permissions. You will now be able to access the reports.

--- Additional comment from Kaushal on 2015-04-08 05:55:59 EDT ---

The crash is caused due to race between exit and a socket event.
```
(gdb) thr a a bt

Thread 7 (Thread 0x7f5ca95b3700 (LWP 13303)):
#0  0x00007f5cb7823705 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cad5af353 in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:501
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5caf040700 (LWP 13151)):
#0  0x00007f5cab879859 in __do_global_dtors_aux () from /lib64/libselinux.so.1
#1  0x00007f5cb8744b5a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f5cb70a8e49 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007f5cb70a8e95 in exit () from /lib64/libc.so.6
#4  0x00007f5cb896253a in cleanup_and_exit (signum=<optimized out>) at
glusterfsd.c:1242
#5  0x00007f5cb8962625 in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:1983
#6  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5caf841700 (LWP 13150)):
#0  0x00007f5cb782699d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f5cb84c7224 in gf_timer_proc (ctx=0x7f5cba015010) at timer.c:191
#2  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5cae83f700 (LWP 13152)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc at entry=0x7f5cba042a20) at
syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042a20) at
syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5cae03e700 (LWP 13153)):
#0  0x00007f5cb7823ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f5cb84e9028 in syncenv_task (proc=proc at entry=0x7f5cba042de0) at
syncop.c:591
#2  0x00007f5cb84e9c90 in syncenv_processor (thdata=0x7f5cba042de0) at
syncop.c:683
#3  0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5cb71661ad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5cb8946740 (LWP 13149)):
#0  0x00007f5cb7820f27 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f5cb8504f25 in event_dispatch_epoll (event_pool=0x7f5cba033c50) at
event-epoll.c:759
#2  0x00007f5cb895f61a in main (argc=4, argv=0x7fff8e7dc2b8) at
glusterfsd.c:2313

Thread 1 (Thread 0x7f5ca8db2700 (LWP 13304)):
#0  0x00007f5cacc66c3b in rcu_bp_register () from /lib64/liburcu-bp.so.1
#1  0x00007f5cacc66f7e in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#2  0x00007f5cad52d5d6 in __glusterd_peer_rpc_notify
(rpc=rpc at entry=0x7f5cba0a17a0, mydata=mydata at entry=0x7f5cba0964a0,
event=event at entry=RPC_CLNT_CONNECT, data=data at entry=0x0) at
glusterd-handler.c:4681
#3  0x00007f5cad5250ec in glusterd_big_locked_notify (rpc=0x7f5cba0a17a0,
mydata=0x7f5cba0964a0, event=RPC_CLNT_CONNECT, data=0x0,
notify_fn=0x7f5cad52d580 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#4  0x00007f5cb827e610 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f5cba0a17d0, event=<optimized out>, data=<optimized out>) at
rpc-clnt.c:926
#5  0x00007f5cb827a4c3 in rpc_transport_notify (this=this at entry=0x7f5cba0a4930,
event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7f5cba0a4930) at
rpc-transport.c:543
#6  0x00007f5caad75a27 in socket_connect_finish
(this=this at entry=0x7f5cba0a4930) at socket.c:2366
#7  0x00007f5caad7af7f in socket_event_handler (fd=fd at entry=11,
idx=idx at entry=2, data=0x7f5cba0a4930, poll_in=0, poll_out=4, poll_err=0) at
socket.c:2396
#8  0x00007f5cb8504c1a in event_dispatch_epoll_handler (event=0x7f5ca8db1ec0,
event_pool=0x7f5cba033c50) at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x7f5cba0425b0) at event-epoll.c:674
#10 0x00007f5cb781fdf5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5cb71661ad in clone () from /lib64/libc.so.6
(gdb)
```

As we can observe, thread 6 is in the process of exiting the process. It has
already run the exit handlers, which cleanup things that require cleaning up.
This includes liburcu resources. By the time thread 1 calls rcu_bp_register(),
the liburcu resources have been cleaned up. rcu_bp_register tries to access
these non-existent resources, which leads to the segmentation fault.

Races like this are hard to fix. As this race and crash happen when the process
is almost at the point of stopping, it doesn't have any serious impact to
functionality apart from the core file and the log message. I'm setting a lower
priority for this bug.

--- Additional comment from Kaushal on 2015-04-08 06:23:43 EDT ---

The fix should be simple enough. GlusterD's fini() doesn't stop tcp socket
listener. If it did, the above situation wouldn't arise as the listener would
have been stopped before the liburcu exit handler was called.

--- Additional comment from Anand Avati on 2015-04-10 07:42:37 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: This patch stops tcp/ip
listeners during  glusterd exit.) posted (#1) for review on master by Anand
Nekkunti (anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-04-13 05:03:52 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#2) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-04-16 05:18:47 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#3) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-04-24 07:14:46 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit.) posted (#4) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-12 05:31:12 EDT ---

REVIEW: http://review.gluster.org/10758 (ligglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#1) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-12 05:44:28 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-12 14:39:11 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-13 07:01:22 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-14 15:00:23 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-14 15:23:41 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-20 05:05:47 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-21 23:55:47 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#9) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-22 02:14:01 EDT ---

REVIEW: http://review.gluster.org/10758 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-22 09:49:15 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#6) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-22 10:24:10 EDT ---

REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#2) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-22 12:46:01 EDT ---

REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#3) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 07:10:11 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#7) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 07:17:57 EDT ---

REVIEW: http://review.gluster.org/10894 (bglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#4) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 07:18:00 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit.) posted (#8) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 07:42:48 EDT ---

REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#5) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 07:42:51 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#9) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-23 13:00:22 EDT ---

REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#6) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-24 05:39:58 EDT ---

REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#7) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-05-24 05:40:00 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#10) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Christopher Pereira on 2015-05-25 17:33:03 EDT ---



--- Additional comment from Anand Avati on 2015-05-28 00:46:50 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#11) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 01:21:30 EDT ---

REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini()  in
cleanup_and_exit()) posted (#8) for review on master by Kaushal M
(kaushal at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 01:22:01 EDT ---

REVIEW: http://review.gluster.org/10894 (glusterfsd: Enabling the fini()  in
cleanup_and_exit()) posted (#9) for review on master by Kaushal M
(kaushal at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 21:13:38 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#12) for review on master by Krishnan Parthasarathi
(kparthas at redhat.com)

--- Additional comment from Anand Avati on 2015-05-30 00:25:05 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#13) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-01 02:15:28 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#14) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-01 04:37:28 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#15) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-01 23:35:10 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#16) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-03 04:37:42 EDT ---

REVIEW: http://review.gluster.org/10894 (libglusterfs: Enabling the fini()  in
cleanup_and_exit()) posted (#10) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-03 04:37:45 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#17) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-03 04:43:02 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#18) for review on master by Anand Nekkunti
(anekkunt at redhat.com)

--- Additional comment from Anand Avati on 2015-06-03 04:56:27 EDT ---

REVIEW: http://review.gluster.org/10197 (glusterd: Stop tcp/ip listeners during
 glusterd exit) posted (#19) for review on master by Anand Nekkunti
(anekkunt at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1209461
[Bug 1209461] BVT: glusterd crashed and dumped during upgrade (on rhel7.1
server)
https://bugzilla.redhat.com/show_bug.cgi?id=1230026
[Bug 1230026] BVT: glusterd crashed and dumped during upgrade (on rhel7.1
server)
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=5molR6XbO9&a=cc_unsubscribe


More information about the Bugs mailing list