[Gluster-devel] Query specific to getting crash

Niels de Vos ndevos at redhat.com
Mon Oct 9 10:46:35 UTC 2017


On Mon, Oct 09, 2017 at 02:07:23PM +0530, Mohit Agrawal wrote:
> +
> 
> On Mon, Oct 9, 2017 at 11:33 AM, Mohit Agrawal <moagrawa at redhat.com> wrote:
> 
> >
> > On Mon, Oct 9, 2017 at 11:16 AM, Mohit Agrawal <moagrawa at redhat.com>
> > wrote:
> >
> >> Hi All,
> >>
> >>
> >> For specific to this patch(https://review.gluster.org/#/c/18436/) i am
> >> getting crash in nfs(only once) for the
> >> test case (./tests/basic/mount-nfs-auth.t), although i tried to execute
> >> the same test case in a loop on centos
> >> machine but i have not found any crash.
> >>
> >> After anaylys the crash it seems cache(entry) is invalidate in thread 10
> >> and same it is trying to access
> >> in thread 1.
> >>
> >> >>>>>>>>>>>>>>>>>>>.
> >>
> >> (gdb) thread 1
> >> [Switching to thread 1 (Thread 0x7fe852cfe700 (LWP 19073))]#0
> >>  0x00007fe859665c85 in auth_cache_lookup (
> >>     cache=0x7fe854027db0, fh=0x7fe84466684c, host_addr=0x7fe844565e40
> >> "23.253.175.80",
> >>     timestamp=0x7fe852cfb1e0, can_write=0x7fe852cfb1dc)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/auth-cache.c:295
> >> 295                *can_write = lookup_res->item->opts->rw;
> >> (gdb) bt
> >> #0  0x00007fe859665c85 in auth_cache_lookup (cache=0x7fe854027db0,
> >> fh=0x7fe84466684c,
> >>     host_addr=0x7fe844565e40 "23.253.175.80", timestamp=0x7fe852cfb1e0,
> >> can_write=0x7fe852cfb1dc)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/auth-cache.c:295
> >> #1  0x00007fe859665ebc in is_nfs_fh_cached (cache=0x7fe854027db0,
> >> fh=0x7fe84466684c,
> >>     host_addr=0x7fe844565e40 "23.253.175.80")
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/auth-cache.c:390
> >> #2  0x00007fe85962b82c in mnt3_check_cached_fh (ms=0x7fe854023d60,
> >> fh=0x7fe84466684c,
> >>     host_addr=0x7fe844565e40 "23.253.175.80", is_write_op=_gf_false)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/mount3.c:1954
> >> #3  0x00007fe85962ba92 in _mnt3_authenticate_req (ms=0x7fe854023d60,
> >> req=0x7fe844679148,
> >>     fh=0x7fe84466684c, path=0x0, authorized_export=0x0,
> >> authorized_host=0x0, is_write_op=_gf_false)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/mount3.c:2011
> >> #4  0x00007fe85962bf65 in mnt3_authenticate_request (ms=0x7fe854023d60,
> >> req=0x7fe844679148,
> >>     fh=0x7fe84466684c, volname=0x0, path=0x0, authorized_path=0x0,
> >> authorized_host=0x0,
> >>     is_write_op=_gf_false)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/mount3.c:2130
> >> #5  0x00007fe859652370 in nfs3_fh_auth_nfsop (cs=0x7fe8446663c8,
> >> is_write_op=_gf_false)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3981
> >> #6  0x00007fe85963631a in nfs3_lookup_resume (carg=0x7fe8446663c8)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3.c:155---Type <return> to continue, or q
> >> <return> to quit---
> >> 9
> >> #7  0x00007fe859651b98 in nfs3_fh_resolve_entry_hard (cs=0x7fe8446663c8)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3791
> >> #8  0x00007fe859651e35 in nfs3_fh_resolve_entry (cs=0x7fe8446663c8)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3844
> >> #9  0x00007fe859651e94 in nfs3_fh_resolve_resume (cs=0x7fe8446663c8)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3862
> >> #10 0x00007fe8596520ad in nfs3_fh_resolve_root (cs=0x7fe8446663c8)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3915
> >> #11 0x00007fe85965245f in nfs3_fh_resolve_and_resume (cs=0x7fe8446663c8,
> >> fh=0x7fe852cfc980,
> >>     entry=0x7fe852cfc9c0 "test-bg-write", resum_fn=0x7fe85963621d
> >> <nfs3_lookup_resume>)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:4011
> >> #12 0x00007fe859636dcf in nfs3_lookup (req=0x7fe844679148,
> >> fh=0x7fe852cfc980, fhlen=52,
> >>     name=0x7fe852cfc9c0 "test-bg-write")
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3.c:1620
> >> #13 0x00007fe85963703f in nfs3svc_lookup (req=0x7fe844679148)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/nfs3.c:1666
> >> #14 0x00007fe86765f585 in rpcsvc_handle_rpc_call (svc=0x7fe854022a00,
> >> trans=0x7fe8545c1fa0,
> >>     msg=0x7fe844334610)
> >> ---Type <return> to continue, or q <return> to quit---
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/rpc/rpc-lib/src/rpcsvc.c:711
> >> #15 0x00007fe86765f8f8 in rpcsvc_notify (trans=0x7fe8545c1fa0,
> >> mydata=0x7fe854022a00,
> >>     event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fe844334610)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/rpc/rpc-lib/src/rpcsvc.c:805
> >> #16 0x00007fe867665458 in rpc_transport_notify (this=0x7fe8545c1fa0,
> >> event=RPC_TRANSPORT_MSG_RECEIVED,
> >>     data=0x7fe844334610)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/rpc/rpc-lib/src/rpc-transport.c:538
> >> #17 0x00007fe85c44561e in socket_event_poll_in (this=0x7fe8545c1fa0,
> >> notify_handled=_gf_true)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/rpc/rpc-transport/socket/src/socket.c:2319
> >> #18 0x00007fe85c445cb1 in socket_event_handler (fd=12, idx=8, gen=103,
> >> data=0x7fe8545c1fa0, poll_in=1,
> >>     poll_out=0, poll_err=0)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/rpc/rpc-transport/socket/src/socket.c:2475
> >> #19 0x00007fe867917fd7 in event_dispatch_epoll_handler
> >> (event_pool=0x7030d0, event=0x7fe852cfde70)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/libglusterfs/src/event-epoll.c:583
> >> #20 0x00007fe8679182d9 in event_dispatch_epoll_worker
> >> (data=0x7fe85403d060)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/libglusterfs/src/event-epoll.c:659
> >> #21 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0
> >> #22 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6
> >>
> >> (gdb) thread 10
> >> [Switching to thread 10 (Thread 0x7fe858ed2700 (LWP 19051))]#0
> >>  0x00007fe866b82334 in __lll_lock_wait ()
> >>    from /lib64/libpthread.so.0
> >> (gdb) bt
> >> #0  0x00007fe866b82334 in __lll_lock_wait () from /lib64/libpthread.so.0
> >> #1  0x00007fe866b7d5d8 in _L_lock_854 () from /lib64/libpthread.so.0
> >> #2  0x00007fe866b7d4a7 in pthread_mutex_lock () from
> >> /lib64/libpthread.so.0
> >> #3  0x00007fe8678a9844 in _gf_msg (
> >>     domain=0x7fe85966a448 "ot/workspace/my_glusterfs_bui
> >> ld/glusterfs-4.0dev/xlators/nfs/server/src/mount3.c",
> >> file=0x7fe85966a3f8 "/lib/glusterd/nfs/exports", function=0x7fe85966b5e0
> >> "init", line=3878,
> >>     level=GF_LOG_INFO, errnum=0, trace=0, msgid=112151,
> >> fmt=0x7fe85966b3b4 "")
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/libglusterfs/src/logging.c:2081
> >> #4  0x00007fe859630287 in _mnt3_auth_param_refresh_thread
> >> (argv=0x7fe854023d60)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/mount3.c:3877
> >> #5  0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0
> >> #6  0x00007fe8664e3bcd in clone () from /lib64/libc.so.6
> >> (gdb) p mstate
> >> No symbol "mstate" in current context.
> >> (gdb) f 4
> >> #4  0x00007fe859630287 in _mnt3_auth_param_refresh_thread
> >> (argv=0x7fe854023d60)
> >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> >> 0dev/xlators/nfs/server/src/mount3.c:3877
> >> 3877
> >>
> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.
> >>
> >> In first level analysis i don't think it is related to my patch, please
> >> check and response on the same if you have seen earlier also.
> >>
> >> For specific to core you can access core from this link
> >> https://build.gluster.org/job/centos6-regression/6759/console

This indeed looks like something that is not related to your change.
These kind of races should have been fixed with protecting the
auth_cache->dict with a lock, and making all auth_cache_entries
reference counted. Going through the code does show any obvious problems
where auth_cache->cache_dict is not protected with auth_cache->lock.

The patches listed here seem to have been an improvement for a while, it
is unclear to me why these kind of problems would surface again.
  https://review.gluster.org/#/q/topic:bug-1226717

Because you had this crash, there seems to be a race condition somewhere
in the auth-cache part of Gluster/NFS. If this happens more regularly,
we should investigate a little more for the cause.

More recently Facebook merged a patch in their 3.8 branch that also adds
locking to the auth_cache structure. However, this change did not base
on the patches linked above. Maybe Shreyas or Jeff (+CC) have seen the
backtrace of the segfault before?
  https://review.gluster.org/18247

Thanks,
Niels


More information about the Gluster-devel mailing list