<div dir="ltr"><div dir="ltr">On Thu, Jan 17, 2019 at 5:29 AM Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Jan 15, 2019 at 2:13 PM Atin Mukherjee <<a href="mailto:atin.mukherjee83@gmail.com" target="_blank">atin.mukherjee83@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="auto">Interesting. I’ll do a deep dive at it sometime this week.</div></div><div><br><div class="gmail_quote"><div dir="ltr">On Tue, 15 Jan 2019 at 14:05, Xavi Hernandez <<a href="mailto:jahernan@redhat.com" target="_blank">jahernan@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Jan 14, 2019 at 11:08 AM Ashish Pandey <<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:"times new roman","new york",times,serif;font-size:12pt;color:rgb(0,0,0)"><div><br></div><div>I downloaded logs of regression runs 1077 and 1073 and tried to investigate it.<br></div><div>In both regression ec/bug-1236065.t is hanging on TEST 70 which is trying to get the online brick count<br></div><div><br></div><div>I can see that in mount/bricks and glusterd logs it has not move forward after this test.<br></div><div>glusterd.log - <br></div><div><br></div><div>[2019-01-06 16:27:51.346408]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count ++++++++++<br>[2019-01-06 16:27:51.645014] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy<br>[2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) [0x7f4c37fe06c3] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) [0x7f4c37fd9b3a] -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type [Invalid argument]<br>[2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer type [Invalid argument]<br>[2019-01-06 16:27:51.649335] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler<br>[2019-01-06 16:27:51.932871] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy<br><br></div><div>It is just taking lot of time to get the status at this point.<br> It looks like there could be some issue with connection or the handing of volume status when some bricks are down.</div></div></div></blockquote><div><br></div>The 'online_brick_count' check uses 'gluster volume status' to get some information, and it does that several times (currently 7). Looking at cmd_history.log, I see that after the 'online_brick_count' at line 70, only one 'gluster volume status' has completed. Apparently the second 'gluster volume status' is hung.</div><div class="gmail_quote"><br></div><div class="gmail_quote">In cli.log I see that the second 'gluster volume status' seems to have started, but not finished:</div><div class="gmail_quote"><br></div>Normal run:<div class="gmail_quote"><blockquote style="margin:0px 0px 0px 40px;border:medium none;padding:0px"><div class="gmail_quote"><span style="font-family:monospace"><span style="color:rgb(0,0,0)">[2019-01-08 16:36:43.628821] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev
</span><br>[2019-01-08 16:36:43.808182] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
<br>[2019-01-08 16:36:43.808287] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
<br>[2019-01-08 16:36:43.808432] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
<br>[2019-01-08 16:36:43.816534] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_process+0x1e4) [0x40db50] -->gluster(cli_cmd_volume_status_cbk+0x90) [0x415bec] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7fefe569456<br>9] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument]
<br>[2019-01-08 16:36:43.816716] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_volume_status_cbk+0x1cb) [0x415d27] -->gluster(gf_cli_status_volume_all+0xc8) [0x42fa94] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7f<br>efe5694569] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument]
<br>[2019-01-08 16:36:43.824437] I [input.c:31:cli_batch] 0-: Exiting with: 0<br></span></div></blockquote></div><div class="gmail_quote"><br></div>Bad run:<blockquote style="margin:0px 0px 0px 40px;border:medium none;padding:0px"><div class="gmail_quote"><span style="font-family:monospace"><span style="color:rgb(0,0,0)">[2019-01-08 16:36:43.940361] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev
</span></span></div><div class="gmail_quote"><span style="font-family:monospace">[2019-01-08 16:36:44.147364] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
</span></div><div class="gmail_quote"><span style="font-family:monospace">[2019-01-08 16:36:44.147477] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
</span></div><div class="gmail_quote"><span style="font-family:monospace">[2019-01-08 16:36:44.147583] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler</span></div></blockquote><font face="monospace"><div><font face="monospace"><br></font></div></font>In glusterd.log it seems as if it hasn't received any status request. It looks like the cli has not even connected to glusterd.</div></blockquote></div></div></blockquote><div><br></div><div>Downloaded the logs for the recent failure from <a href="https://build.gluster.org/job/regression-test-with-multiplex/1092/" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/1092/</a> and based on the log scanning this is what I see:</div><div><br></div><div>1. The test executes with out any issues till line no 74 i.e. "TEST $CLI volume start $V0 force" and cli.log along with cmd_history.log confirm the same:</div><div><br></div><div>cli.log</div><div>====<br></div><div>[2019-01-16 16:28:46.871877]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 73 gluster --mode=script --wignore volume start patchy force ++++++++++<br>[2019-01-16 16:28:46.980780] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev<br>[2019-01-16 16:28:47.185996] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0<br>[2019-01-16 16:28:47.186113] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1<br>[2019-01-16 16:28:47.186234] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler<br>[2019-01-16 16:28:49.223376] I [cli-rpc-ops.c:1448:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume <=== successfully processed the callback<br>[2019-01-16 16:28:49.223668] I [input.c:31:cli_batch] 0-: Exiting with: 0 <br></div><div><br></div><div>cmd_history.log</div><div>============<br></div><div>[2019-01-16 16:28:49.220491] : volume start patchy force : SUCCESS</div><div><br></div><div>However, in both cli and cmd_history log files these are the last set of logs I see which indicates either the test script is completely paused. There's no possibility I see that cli receiving this command and dropping it completely as otherwise we should have atleast seen the "Started running gluster with version 6dev" and "Exiting with" log entries.<br></div><div><br></div><div>I could manage to reproduce this once locally in my system and then when I ran command from another prompt, volume status and all other gluster basic commands go through. I also inspected the processes and I don't see any suspect of processes being hung. <br></div><div><br></div><div>So the mystery continues and we need to see why the test script is not all moving forward.</div></div></div></div></div></div></div></blockquote><div><br></div><div>An additional thing that could be interesting: in all cases I've seen this test to hang, the next test shows an error during cleanup:</div><div><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_quote"><pre style="overflow-wrap: break-word;">Aborting.<br><br>/mnt/nfs/1 could not be deleted, here are the left over items<br>drwxr-xr-x. 2 root root 6 Jan 16 16:41 /d/backends<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/0<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/1<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/2<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/3<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/0<br>drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/1<br><br>Please correct the problem and try again.<font color="#000000"><span style="white-space:pre-wrap"><br></span></font></pre></div></blockquote>This is a bit weird, since this only happens after having removed all these directories with an 'rm -rf', and this command doesn't exit on the first error, so at least some of these directories should have been removed, even is the mount process is hung (all nfs mounts and fuse mounts 1, 2 and 3 are not used by the test). The only explanation I have is that the cleanup function is being executed twice concurrently (probably from two different scripts). The first cleanup is blocked (or is taking a lot of time) removing one of the directories. Meantime the other cleanup has completed and recreated the directories, so when the first one finally finishes, it finds all directories still there, writing the above messages. This would also mean that something is not properly killed between tests. Not sure if that's possible.<div><br></div><div>This could match with your findings, since some commands executed on the second script could "unblock" whatever is blocked in the first one, causing it to progress and show the final error.</div><div><div><br></div><div>Could this explain something ?</div><div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>Xavi</div><div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:"times new roman","new york",times,serif;font-size:12pt;color:rgb(0,0,0)"><div><br></div><div>---<br></div><div>Ashish<br></div><div><br></div><div><br></div><div><br></div><hr id="gmail-m_898926418131331947gmail-m_-3043214236657532182m_-1843389488769026027gmail-m_-1444823287014886692zwchr"><div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From: </b>"Mohit Agrawal" <<a href="mailto:moagrawa@redhat.com" target="_blank">moagrawa@redhat.com</a>><br><b>To: </b>"Shyam Ranganathan" <<a href="mailto:srangana@redhat.com" target="_blank">srangana@redhat.com</a>><br><b>Cc: </b>"Gluster Devel" <<a href="mailto:gluster-devel@gluster.org" target="_blank">gluster-devel@gluster.org</a>><br><b>Sent: </b>Saturday, January 12, 2019 6:46:20 PM<br><b>Subject: </b>Re: [Gluster-devel] Regression health for release-5.next and release-6<br><div><br></div><div dir="ltr"><div dir="ltr"><div>Previous logs related to client not bricks, below are the brick logs</div><div><br></div><div>[2019-01-12 12:25:25.893485]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++</div><div>The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and [2019-01-12 12:25:25.899532]</div><div>[2019-01-12 12:25:25.903375] E [MSGID: 113001] [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] 8-patchy-posix: fgetxattr failed on gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor]</div><div>[2019-01-12 12:25:25.903468] E [MSGID: 115073] [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, error-xlator: patchy-posix [Bad file descriptor]</div><div><br></div><div><br></div><div>Thanks,</div><div>Mohit Agrawal</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal <<a href="mailto:moagrawa@redhat.com" target="_blank">moagrawa@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="auto"><div dir="auto"><br></div><div dir="auto">For specific to "add-brick-and-validate-replicated-volume-options.t" i have posted a patch <a href="https://review.gluster.org/22015" target="_blank">https://review.gluster.org/22015</a>.</div><div dir="auto">For test case "ec/bug-1236065.t" I think the issue needs to be check by ec team</div><div dir="auto"><br></div><div dir="auto">On the brick side, it is showing below logs </div><div dir="auto"><br></div><div dir="auto">>>>>>>>>>>>>>>>>></div><div dir="auto"><br></div><div dir="auto">on wire in the future [Invalid argument]</div><div dir="auto">The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and [2019-01-12 12:25:25.902992]</div><div dir="auto">[2019-01-12 12:25:25.903553] W [MSGID: 114031] [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: remote operation failed [Bad file descriptor]</div><div dir="auto">[2019-01-12 12:25:25.903998] W [MSGID: 122040] [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get size and version : FOP : 'FXATTROP' failed on gfid d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error]</div><div dir="auto">[2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error)</div><div dir="auto"><br></div><div dir="auto">>>>>>>>>>>>>>>>>>>></div><div dir="auto"><br></div><div dir="auto">Test case is getting timed out because "volume heal $V0 full" command is stuck, look's like shd is getting stuck at getxattr</div><div dir="auto"><br></div><div dir="auto">>>>>>>>>>>>>>>.</div><div dir="auto"><br></div><div dir="auto">Thread 8 (Thread 0x7f83777fe700 (LWP 25552)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83777fdbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, child=<optimized out>, loc=0x7f83777fdbb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, entry=<optimized out>, parent=0x7f83777fdde0, data=0x7f83a8030880) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc@entry=0x7f83777fdde0, pid=pid@entry=-6, data=data@entry=0x7f83a8030880, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030880, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8376ffcbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, child=<optimized out>, loc=0x7f8376ffcbb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, entry=<optimized out>, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc@entry=0x7f8376ffcde0, pid=pid@entry=-6, data=data@entry=0x7f83a80308f0, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80308f0, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 6 (Thread 0x7f83767fc700 (LWP 25554)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83767fbbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, child=<optimized out>, loc=0x7f83767fbbb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, entry=<optimized out>, parent=0x7f83767fbde0, data=0x7f83a8030960) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc@entry=0x7f83767fbde0, pid=pid@entry=-6, data=data@entry=0x7f83a8030960, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030960, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8375ffabb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, child=<optimized out>, loc=0x7f8375ffabb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, entry=<optimized out>, parent=0x7f8375ffade0, data=0x7f83a80309d0) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc@entry=0x7f8375ffade0, pid=pid@entry=-6, data=data@entry=0x7f83a80309d0, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80309d0, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 4 (Thread 0x7f83757fa700 (LWP 25556)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83757f9bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, child=<optimized out>, loc=0x7f83757f9bb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, entry=<optimized out>, parent=0x7f83757f9de0, data=0x7f83a8030a40) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc@entry=0x7f83757f9de0, pid=pid@entry=-6, data=data@entry=0x7f83a8030a40, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030a40, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8374ff8bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, child=<optimized out>, loc=0x7f8374ff8bb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, entry=<optimized out>, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc@entry=0x7f8374ff8de0, pid=pid@entry=-6, data=data@entry=0x7f83a8030ab0, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030ab0, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 2 (Thread 0x7f8367fff700 (LWP 25558)):</div><div dir="auto">#0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8367ffebb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680</div><div dir="auto">#2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, child=<optimized out>, loc=0x7f8367ffebb0, full=<optimized out>) at ec-heald.c:161</div><div dir="auto">#3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, entry=<optimized out>, parent=0x7f8367ffede0, data=0x7f83a8030b20) at ec-heald.c:294</div><div dir="auto">#4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc@entry=0x7f8367ffede0, pid=pid@entry=-6, data=data@entry=0x7f83a8030b20, fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125</div><div dir="auto">#5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030b20, inode=<optimized out>) at ec-heald.c:311</div><div dir="auto">#6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at ec-heald.c:372</div><div dir="auto">#7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0</div><div dir="auto">#8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6</div><div dir="auto">Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)):</div><div dir="auto">#0 0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0</div><div dir="auto">#1 0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) at event-epoll.c:846</div><div dir="auto">#2 0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at glusterfsd.c:2848</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">>>>>>>>>>>>>>>>>>>>>>>>>>>.</div><div dir="auto"><br></div><div>Thanks,</div><div>Mohit Agrawal</div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan <<a href="mailto:srangana@redhat.com" target="_blank">srangana@redhat.com</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">We can check health on master post the patch as stated by Mohit below.<br>
<br>
Release-5 is causing some concerns as we need to tag the release<br>
yesterday, but we have the following 2 tests failing or coredumping<br>
pretty regularly, need attention on these.<br>
<br>
ec/bug-1236065.t<br>
glusterd/add-brick-and-validate-replicated-volume-options.t<br>
<br>
Shyam<br>
On 1/10/19 6:20 AM, Mohit Agrawal wrote:<br>
> I think we should consider regression-builds after merged the patch<br>
> (<a href="https://review.gluster.org/#/c/glusterfs/+/21990/" rel="noreferrer noreferrer" target="_blank">https://review.gluster.org/#/c/glusterfs/+/21990/</a>) <br>
> as we know this patch introduced some delay.<br>
> <br>
> Thanks,<br>
> Mohit Agrawal<br>
> <br>
> On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee <<a href="mailto:amukherj@redhat.com" rel="noreferrer" target="_blank">amukherj@redhat.com</a><br>
> <mailto:<a href="mailto:amukherj@redhat.com" rel="noreferrer" target="_blank">amukherj@redhat.com</a>>> wrote:<br>
> <br>
> Mohit, Sanju - request you to investigate the failures related to<br>
> glusterd and brick-mux and report back to the list.<br>
> <br>
> On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan<br>
> <<a href="mailto:srangana@redhat.com" rel="noreferrer" target="_blank">srangana@redhat.com</a> <mailto:<a href="mailto:srangana@redhat.com" rel="noreferrer" target="_blank">srangana@redhat.com</a>>> wrote:<br>
> <br>
> Hi,<br>
> <br>
> As part of branching preparation next week for release-6, please<br>
> find<br>
> test failures and respective test links here [1].<br>
> <br>
> The top tests that are failing/dumping-core are as below and<br>
> need attention,<br>
> - ec/bug-1236065.t<br>
> - glusterd/add-brick-and-validate-replicated-volume-options.t<br>
> - readdir-ahead/bug-1390050.t<br>
> - glusterd/brick-mux-validation.t<br>
> - bug-1432542-mpx-restart-crash.t<br>
> <br>
> Others of interest,<br>
> - replicate/bug-1341650.t<br>
> <br>
> Please file a bug if needed against the test case and report the<br>
> same<br>
> here, in case a problem is already addressed, then do send back the<br>
> patch details that addresses this issue as a response to this mail.<br>
> <br>
> Thanks,<br>
> Shyam<br>
> <br>
> [1] Regression failures:<br>
> <a href="https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view" rel="noreferrer noreferrer" target="_blank">https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view</a><br>
> _______________________________________________<br>
> Gluster-devel mailing list<br>
> <a href="mailto:Gluster-devel@gluster.org" rel="noreferrer" target="_blank">Gluster-devel@gluster.org</a> <mailto:<a href="mailto:Gluster-devel@gluster.org" rel="noreferrer" target="_blank">Gluster-devel@gluster.org</a>><br>
> <a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a><br>
> <br>
> <br>
</blockquote></div>
</blockquote></div>
<br>_______________________________________________<br>Gluster-devel mailing list<br><a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br><a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a></div><div><br></div></div></div>_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a></blockquote></div></div></div>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a></blockquote></div></div>-- <br><div dir="ltr" class="gmail-m_898926418131331947gmail-m_-3043214236657532182gmail_signature">--Atin</div>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a></blockquote></div></div></div></div></div></div>
</blockquote></div></div></div></div>