<div dir="ltr"><div>Hi Niels,</div><div><br></div><div>   Thanks for your response.I will file a bug and will update same bt in bug also.</div><div>   I don&#39;t know about the reproducer, I was getting a crash only one time.</div><div>   Please let us know if anyone has objection to merge this patch.</div><div><br></div><div>Thanks</div><div>Mohit Agrawal</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 9, 2017 at 4:16 PM, Niels de Vos <span dir="ltr">&lt;<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Oct 09, 2017 at 02:07:23PM +0530, Mohit Agrawal wrote:<br>
&gt; +<br>
&gt;<br>
&gt; On Mon, Oct 9, 2017 at 11:33 AM, Mohit Agrawal &lt;<a href="mailto:moagrawa@redhat.com">moagrawa@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; On Mon, Oct 9, 2017 at 11:16 AM, Mohit Agrawal &lt;<a href="mailto:moagrawa@redhat.com">moagrawa@redhat.com</a>&gt;<br>
&gt; &gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt;&gt; Hi All,<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; For specific to this patch(<a href="https://review.gluster.org/#/c/18436/" rel="noreferrer" target="_blank">https://review.gluster.<wbr>org/#/c/18436/</a>) i am<br>
&gt; &gt;&gt; getting crash in nfs(only once) for the<br>
&gt; &gt;&gt; test case (./tests/basic/mount-nfs-auth.<wbr>t), although i tried to execute<br>
&gt; &gt;&gt; the same test case in a loop on centos<br>
&gt; &gt;&gt; machine but i have not found any crash.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; After anaylys the crash it seems cache(entry) is invalidate in thread 10<br>
&gt; &gt;&gt; and same it is trying to access<br>
&gt; &gt;&gt; in thread 1.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; (gdb) thread 1<br>
&gt; &gt;&gt; [Switching to thread 1 (Thread 0x7fe852cfe700 (LWP 19073))]#0<br>
&gt; &gt;&gt;  0x00007fe859665c85 in auth_cache_lookup (<br>
&gt; &gt;&gt;     cache=0x7fe854027db0, fh=0x7fe84466684c, host_addr=0x7fe844565e40<br>
&gt; &gt;&gt; &quot;23.253.175.80&quot;,<br>
&gt; &gt;&gt;     timestamp=0x7fe852cfb1e0, can_write=0x7fe852cfb1dc)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:295<br>
&gt; &gt;&gt; 295                *can_write = lookup_res-&gt;item-&gt;opts-&gt;rw;<br>
&gt; &gt;&gt; (gdb) bt<br>
&gt; &gt;&gt; #0  0x00007fe859665c85 in auth_cache_lookup (cache=0x7fe854027db0,<br>
&gt; &gt;&gt; fh=0x7fe84466684c,<br>
&gt; &gt;&gt;     host_addr=0x7fe844565e40 &quot;23.253.175.80&quot;, timestamp=0x7fe852cfb1e0,<br>
&gt; &gt;&gt; can_write=0x7fe852cfb1dc)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:295<br>
&gt; &gt;&gt; #1  0x00007fe859665ebc in is_nfs_fh_cached (cache=0x7fe854027db0,<br>
&gt; &gt;&gt; fh=0x7fe84466684c,<br>
&gt; &gt;&gt;     host_addr=0x7fe844565e40 &quot;23.253.175.80&quot;)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>auth-cache.c:390<br>
&gt; &gt;&gt; #2  0x00007fe85962b82c in mnt3_check_cached_fh (ms=0x7fe854023d60,<br>
&gt; &gt;&gt; fh=0x7fe84466684c,<br>
&gt; &gt;&gt;     host_addr=0x7fe844565e40 &quot;23.253.175.80&quot;, is_write_op=_gf_false)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>mount3.c:1954<br>
&gt; &gt;&gt; #3  0x00007fe85962ba92 in _mnt3_authenticate_req (ms=0x7fe854023d60,<br>
&gt; &gt;&gt; req=0x7fe844679148,<br>
&gt; &gt;&gt;     fh=0x7fe84466684c, path=0x0, authorized_export=0x0,<br>
&gt; &gt;&gt; authorized_host=0x0, is_write_op=_gf_false)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>mount3.c:2011<br>
&gt; &gt;&gt; #4  0x00007fe85962bf65 in mnt3_authenticate_request (ms=0x7fe854023d60,<br>
&gt; &gt;&gt; req=0x7fe844679148,<br>
&gt; &gt;&gt;     fh=0x7fe84466684c, volname=0x0, path=0x0, authorized_path=0x0,<br>
&gt; &gt;&gt; authorized_host=0x0,<br>
&gt; &gt;&gt;     is_write_op=_gf_false)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>mount3.c:2130<br>
&gt; &gt;&gt; #5  0x00007fe859652370 in nfs3_fh_auth_nfsop (cs=0x7fe8446663c8,<br>
&gt; &gt;&gt; is_write_op=_gf_false)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3981<br>
&gt; &gt;&gt; #6  0x00007fe85963631a in nfs3_lookup_resume (carg=0x7fe8446663c8)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3.c:155---Type &lt;return&gt; to continue, or q<br>
&gt; &gt;&gt; &lt;return&gt; to quit---<br>
&gt; &gt;&gt; 9<br>
&gt; &gt;&gt; #7  0x00007fe859651b98 in nfs3_fh_resolve_entry_hard (cs=0x7fe8446663c8)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3791<br>
&gt; &gt;&gt; #8  0x00007fe859651e35 in nfs3_fh_resolve_entry (cs=0x7fe8446663c8)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3844<br>
&gt; &gt;&gt; #9  0x00007fe859651e94 in nfs3_fh_resolve_resume (cs=0x7fe8446663c8)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3862<br>
&gt; &gt;&gt; #10 0x00007fe8596520ad in nfs3_fh_resolve_root (cs=0x7fe8446663c8)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:3915<br>
&gt; &gt;&gt; #11 0x00007fe85965245f in nfs3_fh_resolve_and_resume (cs=0x7fe8446663c8,<br>
&gt; &gt;&gt; fh=0x7fe852cfc980,<br>
&gt; &gt;&gt;     entry=0x7fe852cfc9c0 &quot;test-bg-write&quot;, resum_fn=0x7fe85963621d<br>
&gt; &gt;&gt; &lt;nfs3_lookup_resume&gt;)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3-helpers.c:4011<br>
&gt; &gt;&gt; #12 0x00007fe859636dcf in nfs3_lookup (req=0x7fe844679148,<br>
&gt; &gt;&gt; fh=0x7fe852cfc980, fhlen=52,<br>
&gt; &gt;&gt;     name=0x7fe852cfc9c0 &quot;test-bg-write&quot;)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3.c:1620<br>
&gt; &gt;&gt; #13 0x00007fe85963703f in nfs3svc_lookup (req=0x7fe844679148)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>nfs3.c:1666<br>
&gt; &gt;&gt; #14 0x00007fe86765f585 in rpcsvc_handle_rpc_call (svc=0x7fe854022a00,<br>
&gt; &gt;&gt; trans=0x7fe8545c1fa0,<br>
&gt; &gt;&gt;     msg=0x7fe844334610)<br>
&gt; &gt;&gt; ---Type &lt;return&gt; to continue, or q &lt;return&gt; to quit---<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/rpc/rpc-lib/src/rpcsvc.c:<wbr>711<br>
&gt; &gt;&gt; #15 0x00007fe86765f8f8 in rpcsvc_notify (trans=0x7fe8545c1fa0,<br>
&gt; &gt;&gt; mydata=0x7fe854022a00,<br>
&gt; &gt;&gt;     event=RPC_TRANSPORT_MSG_<wbr>RECEIVED, data=0x7fe844334610)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/rpc/rpc-lib/src/rpcsvc.c:<wbr>805<br>
&gt; &gt;&gt; #16 0x00007fe867665458 in rpc_transport_notify (this=0x7fe8545c1fa0,<br>
&gt; &gt;&gt; event=RPC_TRANSPORT_MSG_<wbr>RECEIVED,<br>
&gt; &gt;&gt;     data=0x7fe844334610)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/rpc/rpc-lib/src/rpc-<wbr>transport.c:538<br>
&gt; &gt;&gt; #17 0x00007fe85c44561e in socket_event_poll_in (this=0x7fe8545c1fa0,<br>
&gt; &gt;&gt; notify_handled=_gf_true)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/rpc/rpc-transport/socket/<wbr>src/socket.c:2319<br>
&gt; &gt;&gt; #18 0x00007fe85c445cb1 in socket_event_handler (fd=12, idx=8, gen=103,<br>
&gt; &gt;&gt; data=0x7fe8545c1fa0, poll_in=1,<br>
&gt; &gt;&gt;     poll_out=0, poll_err=0)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/rpc/rpc-transport/socket/<wbr>src/socket.c:2475<br>
&gt; &gt;&gt; #19 0x00007fe867917fd7 in event_dispatch_epoll_handler<br>
&gt; &gt;&gt; (event_pool=0x7030d0, event=0x7fe852cfde70)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/libglusterfs/src/event-<wbr>epoll.c:583<br>
&gt; &gt;&gt; #20 0x00007fe8679182d9 in event_dispatch_epoll_worker<br>
&gt; &gt;&gt; (data=0x7fe85403d060)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/libglusterfs/src/event-<wbr>epoll.c:659<br>
&gt; &gt;&gt; #21 0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0<br>
&gt; &gt;&gt; #22 0x00007fe8664e3bcd in clone () from /lib64/libc.so.6<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; (gdb) thread 10<br>
&gt; &gt;&gt; [Switching to thread 10 (Thread 0x7fe858ed2700 (LWP 19051))]#0<br>
&gt; &gt;&gt;  0x00007fe866b82334 in __lll_lock_wait ()<br>
&gt; &gt;&gt;    from /lib64/libpthread.so.0<br>
&gt; &gt;&gt; (gdb) bt<br>
&gt; &gt;&gt; #0  0x00007fe866b82334 in __lll_lock_wait () from /lib64/libpthread.so.0<br>
&gt; &gt;&gt; #1  0x00007fe866b7d5d8 in _L_lock_854 () from /lib64/libpthread.so.0<br>
&gt; &gt;&gt; #2  0x00007fe866b7d4a7 in pthread_mutex_lock () from<br>
&gt; &gt;&gt; /lib64/libpthread.so.0<br>
&gt; &gt;&gt; #3  0x00007fe8678a9844 in _gf_msg (<br>
&gt; &gt;&gt;     domain=0x7fe85966a448 &quot;ot/workspace/my_glusterfs_bui<br>
&gt; &gt;&gt; ld/glusterfs-4.0dev/xlators/<wbr>nfs/server/src/mount3.c&quot;,<br>
&gt; &gt;&gt; file=0x7fe85966a3f8 &quot;/lib/glusterd/nfs/exports&quot;, function=0x7fe85966b5e0<br>
&gt; &gt;&gt; &quot;init&quot;, line=3878,<br>
&gt; &gt;&gt;     level=GF_LOG_INFO, errnum=0, trace=0, msgid=112151,<br>
&gt; &gt;&gt; fmt=0x7fe85966b3b4 &quot;&quot;)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/libglusterfs/src/logging.<wbr>c:2081<br>
&gt; &gt;&gt; #4  0x00007fe859630287 in _mnt3_auth_param_refresh_<wbr>thread<br>
&gt; &gt;&gt; (argv=0x7fe854023d60)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>mount3.c:3877<br>
&gt; &gt;&gt; #5  0x00007fe866b7baa1 in start_thread () from /lib64/libpthread.so.0<br>
&gt; &gt;&gt; #6  0x00007fe8664e3bcd in clone () from /lib64/libc.so.6<br>
&gt; &gt;&gt; (gdb) p mstate<br>
&gt; &gt;&gt; No symbol &quot;mstate&quot; in current context.<br>
&gt; &gt;&gt; (gdb) f 4<br>
&gt; &gt;&gt; #4  0x00007fe859630287 in _mnt3_auth_param_refresh_<wbr>thread<br>
&gt; &gt;&gt; (argv=0x7fe854023d60)<br>
&gt; &gt;&gt;     at /home/jenkins/root/workspace/<wbr>my_glusterfs_build/glusterfs-<wbr>4.<br>
&gt; &gt;&gt; 0dev/xlators/nfs/server/src/<wbr>mount3.c:3877<br>
&gt; &gt;&gt; 3877<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; &gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<wbr>&gt;&gt;.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; In first level analysis i don&#39;t think it is related to my patch, please<br>
&gt; &gt;&gt; check and response on the same if you have seen earlier also.<br>
&gt; &gt;&gt;<br>
&gt; &gt;&gt; For specific to core you can access core from this link<br>
&gt; &gt;&gt; <a href="https://build.gluster.org/job/centos6-regression/6759/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>centos6-regression/6759/<wbr>console</a><br>
<br>
</div></div>This indeed looks like something that is not related to your change.<br>
These kind of races should have been fixed with protecting the<br>
auth_cache-&gt;dict with a lock, and making all auth_cache_entries<br>
reference counted. Going through the code does show any obvious problems<br>
where auth_cache-&gt;cache_dict is not protected with auth_cache-&gt;lock.<br>
<br>
The patches listed here seem to have been an improvement for a while, it<br>
is unclear to me why these kind of problems would surface again.<br>
  <a href="https://review.gluster.org/#/q/topic:bug-1226717" rel="noreferrer" target="_blank">https://review.gluster.org/#/<wbr>q/topic:bug-1226717</a><br>
<br>
Because you had this crash, there seems to be a race condition somewhere<br>
in the auth-cache part of Gluster/NFS. If this happens more regularly,<br>
we should investigate a little more for the cause.<br>
<br>
More recently Facebook merged a patch in their 3.8 branch that also adds<br>
locking to the auth_cache structure. However, this change did not base<br>
on the patches linked above. Maybe Shreyas or Jeff (+CC) have seen the<br>
backtrace of the segfault before?<br>
  <a href="https://review.gluster.org/18247" rel="noreferrer" target="_blank">https://review.gluster.org/<wbr>18247</a><br>
<br>
Thanks,<br>
Niels<br>
</blockquote></div><br></div>