<div dir="ltr"><div><div>While writing a test for the patch fix of BZ <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1560957">https://bugzilla.redhat.com/show_bug.cgi?id=1560957</a> I just can&#39;t make my test case to pass where a replace brick commit force always fails on a multi node cluster and that&#39;s on the latest mainline code.<br><br></div><div><b>The fix is a one liner:<br></b><br>atin@dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1<br>diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c<br>index af30756c9..24d813fbd 100644<br>--- a/xlators/mgmt/glusterd/src/glusterd-utils.c<br>+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c<br>@@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,<br>                          * TBD: re-use RPC connection across bricks<br>                          */<br>                         if (is_brick_mx_enabled ()) {<br>+                                brickinfo-&gt;port_registered = _gf_true;<br>                                 ret = glusterd_get_sock_from_brick_pid (pid, socketpath,<br>                                                                         sizeof(socketpath));<br>                                 if (ret) {<br><br></div><div><br></div><br><br><b>The test does the following:</b><br><br>#!/bin/bash                                                                        <br>                                                                                   <br>. $(dirname $0)/../../include.rc                                                   <br>. $(dirname $0)/../../cluster.rc                                                   <br>. $(dirname $0)/../../volume.rc                                                    <br>                                                                                   <br>                                                                                   <br>cleanup;                                                                           <br>                                                                                   <br>TEST launch_cluster 3;                                                             <br>                                                                                   <br>TEST $CLI_1 peer probe $H2;                                                        <br>EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count                                          <br>                                                                                   <br>TEST $CLI_1 peer probe $H3;                                                        <br>EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count                                          <br>                                                                                   <br>TEST $CLI_1 volume set all cluster.brick-multiplex on                              <br>                                                                                   <br>TEST $CLI_1 volume create $V0 replica 3 $H1:$B1/${V0}1 $H2:$B2/${V0}1 $H3:$B3/${V0}1 <br>                                                                                   <br>TEST $CLI_1 volume start $V0                                                       <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H1 $B1/${V0}1         <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H2 $B2/${V0}1         <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H3 $B3/${V0}1         <br>                                                                                   <br>                                                                                   <br>#bug-1560957 - replace brick followed by an add-brick in a brick mux setup         <br>#brings down one brick instance                                                    <br>                                                                                   <br>kill_glusterd 3                                                                    <br>EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count                                          <br>TEST $CLI_1 volume replace-brick $V0 $H1:$B1/${V0}1 $H1:$B1/${V0}1_new commit force <br><br><b>this is where the test always fails saying &quot;volume replace-brick: failed: Commit failed on localhost. Please check log file for details.<br></b>                                                                                   <br>TEST $glusterd_3                                                                   <br>EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count                                          <br>                                                                                   <br>TEST $CLI_1 volume add-brick $V0 replica 3 $H1:$$B1/${V0}3 $H2:$B1/${V0}3 $H3:$B1/${V0}3 commit force<br>                                                                                   <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H3 $H3:$B1/${V0}1  <br>cleanup;   <br><br>glusterd log from 1st node <br>[2018-03-27 13:11:58.630845] E [MSGID: 106053] [glusterd-utils.c:13889:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.replace-brick : Transport endpoint is not connected [Transport endpoint is not connected]<br><br></div>Request some help/attention from AFR folks.<br></div>