<div dir="ltr"><div>Hi Niels,</div><div><br></div><div> Thanks for your response.I will file a bug and will update same bt in bug also.</div><div> I don't know about the reproducer, I was getting a crash only one time.</div><div> Please let us know if anyone has objection to merge this patch.</div><div><br></div><div>Thanks</div><div>Mohit Agrawal</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 9, 2017 at 4:16 PM, Niels de Vos <span dir="ltr"><<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Oct 09, 2017 at 02:07:23PM +0530, Mohit Agrawal wrote:<br>
> +<br>
><br>
> On Mon, Oct 9, 2017 at 11:33 AM, Mohit Agrawal <<a href="mailto:moagrawa@redhat.com">moagrawa@redhat.com</a>> wrote:<br>
><br>
> ><br>
> > On Mon, Oct 9, 2017 at 11:16 AM, Mohit Agrawal <<a href="mailto:moagrawa@redhat.com">moagrawa@redhat.com</a>><br>
> > wrote:<br>
> ><br>
> >> Hi All,<br>
> >><br>
> >><br>
> >> For specific to this patch(<a href="https://review.gluster.org/#/c/18436/" rel="noreferrer" target="_blank">https://review.gluster.<wbr>org/#/c/18436/</a>) i am<br>
> >> getting crash in nfs(only once) for the<br>
> >> test case (./tests/basic/mount-nfs-auth.<wbr>t), although i tried to execute<br>
> >> the same test case in a loop on centos<br>
> >> machine but i have not found any crash.<br>
> >><br>
> >> After anaylys the crash it seems cache(entry) is invalidate in thread 10<br>
> >> and same it is trying to access<br>
> >> in thread 1.<br>
> >><br>
> >> >>>>>>>>>>>>>>>>>>>.<br>
> >><br>
> >> (gdb) thread 1<br>
> >> [Switching to thread 1 (Thread 0x7fe852cfe700 (LWP 19073))]#0<br>
> >> 0x00007fe859665c85 in auth_cache_lookup (<br>
> >> cache=0x7fe854027db0, fh=0x7fe84466684c, host_addr=0x7fe844565e40<br>
> >> "23.253.175.80",<br>
> >> timestamp=0x7fe852cfb1e0, can_write=0x7fe852cfb1dc)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:295<br>
> >> 295 *can_write = lookup_res->item->opts->rw;<br>
> >> (gdb) bt<br>
> >> #0 0x00007fe859665c85 in auth_cache_lookup (cache=0x7fe854027db0,<br>
> >> fh=0x7fe84466684c,<br>
> >> host_addr=0x7fe844565e40 "23.253.175.80", timestamp=0x7fe852cfb1e0,<br>
> >> can_write=0x7fe852cfb1dc)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:295<br>
> >> #1 0x00007fe859665ebc in is_nfs_fh_cached (cache=0x7fe854027db0,<br>
> >> fh=0x7fe84466684c,<br>
> >> host_addr=0x7fe844565e40 "23.253.175.80")<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:390<br>
> >> #2 0x00007fe85962b82c in mnt3_check_cached_fh (ms=0x7fe854023d60,<br>
> >> fh=0x7fe84466684c,<br>
> >> host_addr=0x7fe844565e40 "23.253.175.80", is_write_op=_gf_false)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>mount3.c:1954<br>
> >> #3 0x00007fe85962ba92 in _mnt3_authenticate_req (ms=0x7fe854023d60,<br>
> >> req=0x7fe844679148,<br>
> >> fh=0x7fe84466684c, path=0x0, authorized_export=0x0,<br>
> >> authorized_host=0x0, is_write_op=_gf_false)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>mount3.c:2011<br>
> >> #4 0x00007fe85962bf65 in mnt3_authenticate_request (ms=0x7fe854023d60,<br>
> >> req=0x7fe844679148,<br>
> >> fh=0x7fe84466684c, volname=0x0, path=0x0, authorized_path=0x0,<br>
> >> authorized_host=0x0,<br>
> >> is_write_op=_gf_false)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>mount3.c:2130<br>
> >> #5 0x00007fe859652370 in nfs3_fh_auth_nfsop (cs=0x7fe8446663c8,<br>
> >> is_write_op=_gf_false)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3981<br>
> >> #6 0x00007fe85963631a in nfs3_lookup_resume (carg=0x7fe8446663c8)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3.c:155---Type <return> to continue, or q<br>
> >> <return> to quit---<br>
> >> 9<br>
> >> #7 0x00007fe859651b98 in nfs3_fh_resolve_entry_hard (cs=0x7fe8446663c8)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3791<br>
> >> #8 0x00007fe859651e35 in nfs3_fh_resolve_entry (cs=0x7fe8446663c8)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3844<br>
> >> #9 0x00007fe859651e94 in nfs3_fh_resolve_resume (cs=0x7fe8446663c8)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3862<br>
> >> #10 0x00007fe8596520ad in nfs3_fh_resolve_root (cs=0x7fe8446663c8)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3915<br>
> >> #11 0x00007fe85965245f in nfs3_fh_resolve_and_resume (cs=0x7fe8446663c8,<br>
> >> fh=0x7fe852cfc980,<br>
> >> entry=0x7fe852cfc9c0 "test-bg-write", resum_fn=0x7fe85963621d<br>
> >> <nfs3_lookup_resume>)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:4011<br>
> >> #12 0x00007fe859636dcf in nfs3_lookup (req=0x7fe844679148,<br>
> >> fh=0x7fe852cfc980, fhlen=52,<br>
> >> name=0x7fe852cfc9c0 "test-bg-write")<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3.c:1620<br>
> >> #13 0x00007fe85963703f in nfs3svc_lookup (req=0x7fe844679148)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>nfs3.c:1666<br>
> >> #14 0x00007fe86765f585 in rpcsvc_handle_rpc_call (svc=0x7fe854022a00,<br>
> >> trans=0x7fe8545c1fa0,<br>
> >> msg=0x7fe844334610)<br>
> >> ---Type <return> to continue, or q <return> to quit---<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/rpc/rpc-lib/src/rpcsvc.c:<wbr>711<br>
> >> #15 0x00007fe86765f8f8 in rpcsvc_notify (trans=0x7fe8545c1fa0,<br>
> >> mydata=0x7fe854022a00,<br>
> >> event=RPC_TRANSPORT_MSG_<wbr>RECEIVED, data=0x7fe844334610)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/rpc/rpc-lib/src/rpcsvc.c:<wbr>805<br>
> >> #16 0x00007fe867665458 in rpc_transport_notify (this=0x7fe8545c1fa0,<br>
> >> event=RPC_TRANSPORT_MSG_<wbr>RECEIVED,<br>
> >> data=0x7fe844334610)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/rpc/rpc-lib/src/rpc-<wbr>transport.c:538<br>
> >> #17 0x00007fe85c44561e in socket_event_poll_in (this=0x7fe8545c1fa0,<br>
> >> notify_handled=_gf_true)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/rpc/rpc-transport/socket/<wbr>src/socket.c:2319<br>
> >> #18 0x00007fe85c445cb1 in socket_event_handler (fd=12, idx=8, gen=103,<br>
> >> data=0x7fe8545c1fa0, poll_in=1,<br>
> >> poll_out=0, poll_err=0)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/rpc/rpc-transport/socket/<wbr>src/socket.c:2475<br>
> >> #19 0x00007fe867917fd7 in event_dispatch_epoll_handler<br>
> >> (event_pool=0x7030d0, event=0x7fe852cfde70)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/libglusterfs/src/event-<wbr>epoll.c:583<br>
> >> #20 0x00007fe8679182d9 in event_dispatch_epoll_worker<br>
> >> (data=0x7fe85403d060)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/libglusterfs/src/event-<wbr>epoll.c:659<br>
> >> #21 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0<br>
> >> #22 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6<br>
> >><br>
> >> (gdb) thread 10<br>
> >> [Switching to thread 10 (Thread 0x7fe858ed2700 (LWP 19051))]#0<br>
> >> 0x00007fe866b82334 in __lll_lock_wait ()<br>
> >> from /lib64/libpthread.so.0<br>
> >> (gdb) bt<br>
> >> #0 0x00007fe866b82334 in __lll_lock_wait () from /lib64/libpthread.so.0<br>
> >> #1 0x00007fe866b7d5d8 in _L_lock_854 () from /lib64/libpthread.so.0<br>
> >> #2 0x00007fe866b7d4a7 in pthread_mutex_lock () from<br>
> >> /lib64/libpthread.so.0<br>
> >> #3 0x00007fe8678a9844 in _gf_msg (<br>
> >> domain=0x7fe85966a448 "ot/workspace/my_glusterfs_bui<br>
> >> ld/glusterfs-4.0dev/xlators/<wbr>nfs/server/src/mount3.c",<br>
> >> file=0x7fe85966a3f8 "/lib/glusterd/nfs/exports", function=0x7fe85966b5e0<br>
> >> "init", line=3878,<br>
> >> level=GF_LOG_INFO, errnum=0, trace=0, msgid=112151,<br>
> >> fmt=0x7fe85966b3b4 "")<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/libglusterfs/src/logging.<wbr>c:2081<br>
> >> #4 0x00007fe859630287 in _mnt3_auth_param_refresh_<wbr>thread<br>
> >> (argv=0x7fe854023d60)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>mount3.c:3877<br>
> >> #5 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0<br>
> >> #6 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6<br>
> >> (gdb) p mstate<br>
> >> No symbol "mstate" in current context.<br>
> >> (gdb) f 4<br>
> >> #4 0x00007fe859630287 in _mnt3_auth_param_refresh_<wbr>thread<br>
> >> (argv=0x7fe854023d60)<br>
> >> at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
> >> 0dev/xlators/nfs/server/src/<wbr>mount3.c:3877<br>
> >> 3877<br>
> >><br>
> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>><wbr>>>.<br>
> >><br>
> >> In first level analysis i don't think it is related to my patch, please<br>
> >> check and response on the same if you have seen earlier also.<br>
> >><br>
> >> For specific to core you can access core from this link<br>
> >> <a href="https://build.gluster.org/job/centos6-regression/6759/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>centos6-regression/6759/<wbr>console</a><br>
<br>
</div></div>This indeed looks like something that is not related to your change.<br>
These kind of races should have been fixed with protecting the<br>
auth_cache->dict with a lock, and making all auth_cache_entries<br>
reference counted. Going through the code does show any obvious problems<br>
where auth_cache->cache_dict is not protected with auth_cache->lock.<br>
<br>
The patches listed here seem to have been an improvement for a while, it<br>
is unclear to me why these kind of problems would surface again.<br>
<a href="https://review.gluster.org/#/q/topic:bug-1226717" rel="noreferrer" target="_blank">https://review.gluster.org/#/<wbr>q/topic:bug-1226717</a><br>
<br>
Because you had this crash, there seems to be a race condition somewhere<br>
in the auth-cache part of Gluster/NFS. If this happens more regularly,<br>
we should investigate a little more for the cause.<br>
<br>
More recently Facebook merged a patch in their 3.8 branch that also adds<br>
locking to the auth_cache structure. However, this change did not base<br>
on the patches linked above. Maybe Shreyas or Jeff (+CC) have seen the<br>
backtrace of the segfault before?<br>
<a href="https://review.gluster.org/18247" rel="noreferrer" target="_blank">https://review.gluster.org/<wbr>18247</a><br>
<br>
Thanks,<br>
Niels<br>
</blockquote></div><br></div>