[Gluster-devel] replace-brick commit force fails in multi node cluster

Atin Mukherjee amukherj at redhat.com
Tue Mar 27 13:55:13 UTC 2018


While writing a test for the patch fix of BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1560957 I just can't make my
test case to pass where a replace brick commit force always fails on a
multi node cluster and that's on the latest mainline code.


*The fix is a one liner:*
atin at dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1
diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c
b/xlators/mgmt/glusterd/src/glusterd-utils.c
index af30756c9..24d813fbd 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,
                          * TBD: re-use RPC connection across bricks
                          */
                         if (is_brick_mx_enabled ()) {
+                                brickinfo->port_registered = _gf_true;
                                 ret = glusterd_get_sock_from_brick_pid
(pid, socketpath,

sizeof(socketpath));
                                 if (ret) {




*The test does the following:*

#!/bin/bash



. $(dirname
$0)/../../include.rc
. $(dirname
$0)/../../cluster.rc
. $(dirname
$0)/../../volume.rc




cleanup;



TEST launch_cluster
3;


TEST $CLI_1 peer probe
$H2;
EXPECT_WITHIN $PROBE_TIMEOUT 1
peer_count


TEST $CLI_1 peer probe
$H3;
EXPECT_WITHIN $PROBE_TIMEOUT 2
peer_count


TEST $CLI_1 volume set all cluster.brick-multiplex
on


TEST $CLI_1 volume create $V0 replica 3 $H1:$B1/${V0}1 $H2:$B2/${V0}1
$H3:$B3/${V0}1


TEST $CLI_1 volume start
$V0
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1
$B1/${V0}1
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2
$B2/${V0}1
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
$B3/${V0}1




#bug-1560957 - replace brick followed by an add-brick in a brick mux
setup
#brings down one brick
instance


kill_glusterd
3
EXPECT_WITHIN $PROBE_TIMEOUT 1
peer_count
TEST $CLI_1 volume replace-brick $V0 $H1:$B1/${V0}1 $H1:$B1/${V0}1_new
commit force


*this is where the test always fails saying "volume replace-brick: failed:
Commit failed on localhost. Please check log file for details.*

TEST
$glusterd_3

EXPECT_WITHIN $PROBE_TIMEOUT 2
peer_count


TEST $CLI_1 volume add-brick $V0 replica 3 $H1:$$B1/${V0}3 $H2:$B1/${V0}3
$H3:$B1/${V0}3 commit force


EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
$H3:$B1/${V0}1
cleanup;

glusterd log from 1st node
[2018-03-27 13:11:58.630845] E [MSGID: 106053]
[glusterd-utils.c:13889:glusterd_handle_replicate_brick_ops] 0-management:
Failed to set extended attribute trusted.replace-brick : Transport endpoint
is not connected [Transport endpoint is not connected]

Request some help/attention from AFR folks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180327/7675d7b5/attachment.html>


More information about the Gluster-devel mailing list