[Gluster-devel] Query specific to getting crash

Mohit Agrawal moagrawa at redhat.com
Tue Oct 10 04:09:44 UTC 2017


Hi Niels,

   Thanks for your response.I will file a bug and will update same bt in
bug also.
   I don't know about the reproducer, I was getting a crash only one time.
   Please let us know if anyone has objection to merge this patch.

Thanks
Mohit Agrawal

On Mon, Oct 9, 2017 at 4:16 PM, Niels de Vos <ndevos at redhat.com> wrote:

> On Mon, Oct 09, 2017 at 02:07:23PM +0530, Mohit Agrawal wrote:
> > +
> >
> > On Mon, Oct 9, 2017 at 11:33 AM, Mohit Agrawal <moagrawa at redhat.com>
> wrote:
> >
> > >
> > > On Mon, Oct 9, 2017 at 11:16 AM, Mohit Agrawal <moagrawa at redhat.com>
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >>
> > >> For specific to this patch(https://review.gluster.org/#/c/18436/) i
> am
> > >> getting crash in nfs(only once) for the
> > >> test case (./tests/basic/mount-nfs-auth.t), although i tried to
> execute
> > >> the same test case in a loop on centos
> > >> machine but i have not found any crash.
> > >>
> > >> After anaylys the crash it seems cache(entry) is invalidate in thread
> 10
> > >> and same it is trying to access
> > >> in thread 1.
> > >>
> > >> >>>>>>>>>>>>>>>>>>>.
> > >>
> > >> (gdb) thread 1
> > >> [Switching to thread 1 (Thread 0x7fe852cfe700 (LWP 19073))]#0
> > >>  0x00007fe859665c85 in auth_cache_lookup (
> > >>     cache=0x7fe854027db0, fh=0x7fe84466684c, host_addr=0x7fe844565e40
> > >> "23.253.175.80",
> > >>     timestamp=0x7fe852cfb1e0, can_write=0x7fe852cfb1dc)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/auth-cache.c:295
> > >> 295                *can_write = lookup_res->item->opts->rw;
> > >> (gdb) bt
> > >> #0  0x00007fe859665c85 in auth_cache_lookup (cache=0x7fe854027db0,
> > >> fh=0x7fe84466684c,
> > >>     host_addr=0x7fe844565e40 "23.253.175.80",
> timestamp=0x7fe852cfb1e0,
> > >> can_write=0x7fe852cfb1dc)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/auth-cache.c:295
> > >> #1  0x00007fe859665ebc in is_nfs_fh_cached (cache=0x7fe854027db0,
> > >> fh=0x7fe84466684c,
> > >>     host_addr=0x7fe844565e40 "23.253.175.80")
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/auth-cache.c:390
> > >> #2  0x00007fe85962b82c in mnt3_check_cached_fh (ms=0x7fe854023d60,
> > >> fh=0x7fe84466684c,
> > >>     host_addr=0x7fe844565e40 "23.253.175.80", is_write_op=_gf_false)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/mount3.c:1954
> > >> #3  0x00007fe85962ba92 in _mnt3_authenticate_req (ms=0x7fe854023d60,
> > >> req=0x7fe844679148,
> > >>     fh=0x7fe84466684c, path=0x0, authorized_export=0x0,
> > >> authorized_host=0x0, is_write_op=_gf_false)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/mount3.c:2011
> > >> #4  0x00007fe85962bf65 in mnt3_authenticate_request
> (ms=0x7fe854023d60,
> > >> req=0x7fe844679148,
> > >>     fh=0x7fe84466684c, volname=0x0, path=0x0, authorized_path=0x0,
> > >> authorized_host=0x0,
> > >>     is_write_op=_gf_false)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/mount3.c:2130
> > >> #5  0x00007fe859652370 in nfs3_fh_auth_nfsop (cs=0x7fe8446663c8,
> > >> is_write_op=_gf_false)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3981
> > >> #6  0x00007fe85963631a in nfs3_lookup_resume (carg=0x7fe8446663c8)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3.c:155---Type <return> to continue,
> or q
> > >> <return> to quit---
> > >> 9
> > >> #7  0x00007fe859651b98 in nfs3_fh_resolve_entry_hard
> (cs=0x7fe8446663c8)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3791
> > >> #8  0x00007fe859651e35 in nfs3_fh_resolve_entry (cs=0x7fe8446663c8)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3844
> > >> #9  0x00007fe859651e94 in nfs3_fh_resolve_resume (cs=0x7fe8446663c8)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3862
> > >> #10 0x00007fe8596520ad in nfs3_fh_resolve_root (cs=0x7fe8446663c8)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:3915
> > >> #11 0x00007fe85965245f in nfs3_fh_resolve_and_resume
> (cs=0x7fe8446663c8,
> > >> fh=0x7fe852cfc980,
> > >>     entry=0x7fe852cfc9c0 "test-bg-write", resum_fn=0x7fe85963621d
> > >> <nfs3_lookup_resume>)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3-helpers.c:4011
> > >> #12 0x00007fe859636dcf in nfs3_lookup (req=0x7fe844679148,
> > >> fh=0x7fe852cfc980, fhlen=52,
> > >>     name=0x7fe852cfc9c0 "test-bg-write")
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3.c:1620
> > >> #13 0x00007fe85963703f in nfs3svc_lookup (req=0x7fe844679148)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/nfs3.c:1666
> > >> #14 0x00007fe86765f585 in rpcsvc_handle_rpc_call (svc=0x7fe854022a00,
> > >> trans=0x7fe8545c1fa0,
> > >>     msg=0x7fe844334610)
> > >> ---Type <return> to continue, or q <return> to quit---
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/rpc/rpc-lib/src/rpcsvc.c:711
> > >> #15 0x00007fe86765f8f8 in rpcsvc_notify (trans=0x7fe8545c1fa0,
> > >> mydata=0x7fe854022a00,
> > >>     event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fe844334610)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/rpc/rpc-lib/src/rpcsvc.c:805
> > >> #16 0x00007fe867665458 in rpc_transport_notify (this=0x7fe8545c1fa0,
> > >> event=RPC_TRANSPORT_MSG_RECEIVED,
> > >>     data=0x7fe844334610)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/rpc/rpc-lib/src/rpc-transport.c:538
> > >> #17 0x00007fe85c44561e in socket_event_poll_in (this=0x7fe8545c1fa0,
> > >> notify_handled=_gf_true)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/rpc/rpc-transport/socket/src/socket.c:2319
> > >> #18 0x00007fe85c445cb1 in socket_event_handler (fd=12, idx=8, gen=103,
> > >> data=0x7fe8545c1fa0, poll_in=1,
> > >>     poll_out=0, poll_err=0)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/rpc/rpc-transport/socket/src/socket.c:2475
> > >> #19 0x00007fe867917fd7 in event_dispatch_epoll_handler
> > >> (event_pool=0x7030d0, event=0x7fe852cfde70)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/libglusterfs/src/event-epoll.c:583
> > >> #20 0x00007fe8679182d9 in event_dispatch_epoll_worker
> > >> (data=0x7fe85403d060)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/libglusterfs/src/event-epoll.c:659
> > >> #21 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0
> > >> #22 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6
> > >>
> > >> (gdb) thread 10
> > >> [Switching to thread 10 (Thread 0x7fe858ed2700 (LWP 19051))]#0
> > >>  0x00007fe866b82334 in __lll_lock_wait ()
> > >>    from /lib64/libpthread.so.0
> > >> (gdb) bt
> > >> #0  0x00007fe866b82334 in __lll_lock_wait () from
> /lib64/libpthread.so.0
> > >> #1  0x00007fe866b7d5d8 in _L_lock_854 () from /lib64/libpthread.so.0
> > >> #2  0x00007fe866b7d4a7 in pthread_mutex_lock () from
> > >> /lib64/libpthread.so.0
> > >> #3  0x00007fe8678a9844 in _gf_msg (
> > >>     domain=0x7fe85966a448 "ot/workspace/my_glusterfs_bui
> > >> ld/glusterfs-4.0dev/xlators/nfs/server/src/mount3.c",
> > >> file=0x7fe85966a3f8 "/lib/glusterd/nfs/exports",
> function=0x7fe85966b5e0
> > >> "init", line=3878,
> > >>     level=GF_LOG_INFO, errnum=0, trace=0, msgid=112151,
> > >> fmt=0x7fe85966b3b4 "")
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/libglusterfs/src/logging.c:2081
> > >> #4  0x00007fe859630287 in _mnt3_auth_param_refresh_thread
> > >> (argv=0x7fe854023d60)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/mount3.c:3877
> > >> #5  0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0
> > >> #6  0x00007fe8664e3bcd in clone () from /lib64/libc.so.6
> > >> (gdb) p mstate
> > >> No symbol "mstate" in current context.
> > >> (gdb) f 4
> > >> #4  0x00007fe859630287 in _mnt3_auth_param_refresh_thread
> > >> (argv=0x7fe854023d60)
> > >>     at /home/jenkins/root/workspace/my_glusterfs_build/glusterfs-4.
> > >> 0dev/xlators/nfs/server/src/mount3.c:3877
> > >> 3877
> > >>
> > >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.
> > >>
> > >> In first level analysis i don't think it is related to my patch,
> please
> > >> check and response on the same if you have seen earlier also.
> > >>
> > >> For specific to core you can access core from this link
> > >> https://build.gluster.org/job/centos6-regression/6759/console
>
> This indeed looks like something that is not related to your change.
> These kind of races should have been fixed with protecting the
> auth_cache->dict with a lock, and making all auth_cache_entries
> reference counted. Going through the code does show any obvious problems
> where auth_cache->cache_dict is not protected with auth_cache->lock.
>
> The patches listed here seem to have been an improvement for a while, it
> is unclear to me why these kind of problems would surface again.
>   https://review.gluster.org/#/q/topic:bug-1226717
>
> Because you had this crash, there seems to be a race condition somewhere
> in the auth-cache part of Gluster/NFS. If this happens more regularly,
> we should investigate a little more for the cause.
>
> More recently Facebook merged a patch in their 3.8 branch that also adds
> locking to the auth_cache structure. However, this change did not base
> on the patches linked above. Maybe Shreyas or Jeff (+CC) have seen the
> backtrace of the segfault before?
>   https://review.gluster.org/18247
>
> Thanks,
> Niels
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171010/eb16f1ee/attachment-0001.html>


More information about the Gluster-devel mailing list