<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 31 July 2018 at 22:11, Atin Mukherjee <span dir="ltr">&lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here&#39;s what I can summarize.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails only with brick-mux<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/core/bug-1432542-<wbr>mpx-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/209?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4051/<wbr>consoleText</a> . Wasn&#39;t timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770<wbr>b7f12e80d2 has added the delay and now 400 secs isn&#39;t sufficient enough (Mohit?)<br></div></blockquote><div><br></div><div>One of the failed regression-test-burn in was an actual failure,not a timeout. </div><div><a href="https://build.gluster.org/job/regression-test-burn-in/4049">https://build.gluster.org/job/regression-test-burn-in/4049</a><br></div><div><br></div><div>The brick disconnects from glusterd:<br></div><div><br></div><div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:42.882668] I [MSGID: 106005] [glusterd-handler.c:6129:__glusterd_brick_rpc_notify] 0-management: Brick builder103.cloud.gluster.org:/d/backends/vol01/brick0 has disconnected from glusterd.</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:42.891031] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0<b>-pmap: removing brick /d/backends/vol01/brick0 on port 49152</b></font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:42.892379] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick (null) on port 49152</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:02.636027]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 56 _GFS --attribute-timeout=0 --entry-timeout=0 -s <a href="http://builder103.cloud.gluster.org">builder103.cloud.gluster.org</a> --volfile-id=patchy-vol20 /mnt/glusterfs/vol20 ++++++++++</font></div></div><div><br></div><div><br></div><div>So the client cannot connect to the bricks after this as it never gets the port info from glusterd. From mnt-glusterfs-vol20.log:</div><div><br></div><div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:02.769947] I [MSGID: 114020] [client.c:2329:notify] 0-patchy-vol20-client-1: parent translators are ready, attempting connect on transport</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:02.770677] E [MSGID: 114058] [client-handshake.c:1518:client_query_portmap_cbk] 0-patchy-vol20-client-0: <b>failed to get the port number for remote subvolume. Please run &#39;gluster volume status&#39; on server to see if brick process is running</b>.</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:02.770767] I [MSGID: 114018] [client.c:2255:client_rpc_notify] 0-patchy-vol20-client-0: disconnected from patchy-vol20-client-0. Client process will keep trying to connect to glusterd until brick&#39;s port is available</font></div></div><div><br></div><div><br></div><div>From the brick logs:<br></div><div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:34.729241] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 2b65c380-392e-459f-b722-c130aac29377</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:34.945474] I [MSGID: 115029] [server-handshake.c:786:server_setvolume] 0-patchy-vol01-server: accepted client from CTX_ID:72dcd65e-2125-4a79-8331-48c0fe9abce7-GRAPH_ID:0-PID:8483-HOST:builder103.cloud.gluster.org-PC_NAME:patchy-vol06-client-2-RECON_NO:-0 (version: 4.2dev)</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:28:35.946588] I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: key &#39;glusterfs.xattrop_index_gfid&#39; is would not be sent on wire in future [Invalid argument]  <b>  &lt;--- Last Brick Log. It looks like the brick went down at this point.</b></font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:02.636027]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 56 _GFS --attribute-timeout=0 --entry-timeout=0 -s <a href="http://builder103.cloud.gluster.org">builder103.cloud.gluster.org</a> --volfile-id=patchy-vol20 /mnt/glusterfs/vol20 ++++++++++</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:12.021827]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 83 dd if=/dev/zero of=/mnt/glusterfs/vol20/a_file bs=4k count=1 ++++++++++</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:12.039248]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 87 killall -9 glusterd ++++++++++</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:17.073995]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 89 killall -9 glusterfsd ++++++++++</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:22.096385]:++++++++++ G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 95 glusterd ++++++++++</font></div><div><font face="monospace, monospace" size="1">[2018-07-27 16:29:24.481555] I [MSGID: 100030] [glusterfsd.c:2728:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbin/glusterfsd version 4.2dev (args: /build/install/sbin/glusterfsd -s <a href="http://builder103.cloud.gluster.org">builder103.cloud.gluster.org</a> --volfile-id patchy-vol01.builder103.cloud.gluster.org.d-backends-vol01-brick0 -p /var/run/gluster/vols/patchy-vol01/builder103.cloud.gluster.org-d-backends-vol01-brick0.pid -S /var/run/gluster/f4d6c8f7c3f85b18.socket --brick-name /d/backends/vol01/brick0 -l /var/log/glusterfs/bricks/d-backends-vol01-brick0.log --xlator-option *-posix.glusterd-uuid=0db25f79-8880-4f2d-b1e8-584e751ff0b9 --process-name brick --brick-port 49153 --xlator-option patchy-vol01-server.listen-port=49153)</font></div><div><br></div></div><div><br></div><div>From /var/log/messages:</div><div><div><b><font size="1">Jul 27 16:28:42 builder103 kernel: [ 2902]     0  2902  3777638   200036    2322        0             0 glusterfsd</font></b></div></div><div>...</div><div><div><b><font face="monospace, monospace" size="1">Jul 27 16:28:42 builder103 kernel: Out of memory: Kill process 2902 (glusterfsd) score 418 or sacrifice child</font></b></div><div><b><font face="monospace, monospace" size="1">Jul 27 16:28:42 builder103 kernel: Killed process 2902 (glusterfsd) total-vm:15110552kB, anon-rss:800144kB, file-rss:0kB, shmem-rss:0kB</font></b></div><div><b><font face="monospace, monospace" size="1">Jul 27 16:30:01 builder103 systemd: Created slice User Slice of root. </font></b></div></div><div><b><font face="monospace, monospace" size="1"><br></font></b></div><div><font style="" face="arial, helvetica, sans-serif">Possible OOM kill?</font></div><div><font style="" face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Regards,<br></font></div><div><font face="arial, helvetica, sans-serif" style="">Nithya</font></div><div><b><font face="monospace, monospace" size="1"><br></font></b></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>tests/bugs/glusterd/add-brick-<wbr>and-validate-replicated-<wbr>volume-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/814/console</a>) -  Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-<wbr>1433571-undo-pending-only-on-<wbr>up-bricks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/251?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/812/console</a>) - Hasn&#39;t failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/811/console</a>)  - Hasn&#39;t failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bugs/glusterd/remove-<wbr>brick-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails for non-brick mux case too<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/distribute/bug-<wbr>1122443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4050/<wbr>consoleText</a> . There&#39;s an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4039/<wbr>consoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-tarssh.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/392?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-rsync.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/393?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/bugs/glusterd/<wbr>validating-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/580?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> .  Atin has a patch <a href="https://review.gluster.org/20584" target="_blank">https://review.gluster.org/<wbr>20584</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-<wbr>1586020-mark-dirty-for-entry-<wbr>txn-on-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4049/<wbr>consoleText</a> - Need some eyes from AFR folks.<br></div>
<br>______________________________<wbr>_________________<br>
maintainers mailing list<br>
<a href="mailto:maintainers@gluster.org">maintainers@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/maintainers" rel="noreferrer" target="_blank">https://lists.gluster.org/<wbr>mailman/listinfo/maintainers</a><br>
<br></blockquote></div><br></div></div>