[Bugs] [Bug 1642597] tests/bugs/glusterd/ optimized-basic-testcases-in-cluster.t failing

Wed Oct 24 19:21:49 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1642597

--- Comment #1 from Sanju <srakonde at redhat.com> ---
Reason for why this test case is not failing on master:

-> test case
  1 #!/bin/bash
  2 
  3 . $(dirname $0)/../../include.rc
  4 . $(dirname $0)/../../cluster.rc
  5 . $(dirname $0)/../../volume.rc
  6 
  7 function peer_count {
  8 eval \$CLI_$1 peer status | grep 'Peer in Cluster (Connected)' | wc -l
  9 }
 10 
 11 cleanup;
 12 
 13 #bug-1454418 -  Setting Port number in specific range
 14 sysctl
net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
 15 
 16 TEST launch_cluster 3;
 17 
 18 #bug-1223213
 19 
 20 # Fool the cluster to operate with 3.5 version even though binary's
op-version
 21 # is > 3.5. This is to ensure 3.5 code path is hit to test that volume
status
 22 # works when a node is upgraded from 3.5 to 3.7 or higher as mgmt_v3 lock
is
 23 # been introduced in 3.6 version and onwards
 24 
 25 GD1_WD=$($CLI_1 system getwd)
 26 $CLI_1 system uuid get
 27 Old_op_version=$(cat ${GD1_WD}/glusterd.info | grep operating-version | cut
-d '=' -f 2)
 28 
 29 TEST sed -rnie "'s/(operating-version=)\w+/\130500/gip'"
${GD1_WD}/glusterd.info
 30 
 31 TEST kill_glusterd 1
 32 TEST start_glusterd 1
 33 
 34 TEST $CLI_1 peer probe $H2;
 35 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 36 
 37 TEST `sed -i "s/"30500"/${Old_op_version}/g" ${GD1_WD}/glusterd.info`
 38 
 39 TEST kill_glusterd 1
 40 TEST start_glusterd 1
 41 
 42 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 43 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 2
 44 
 45 #bug-1454418
 46 sysctl net.ipv4.ip_local_reserved_ports="
 47 "
 48 
 49 TEST $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0
 50 TEST $CLI_1 volume start $V0
 51 
 52 #bug-888752 - volume status --xml from peer in the cluster
 53 
 54 TEST $CLI_1 volume status $V0 $H2:$B2/$V0 --xml
 55 
 56 TEST $CLI_1 volume stop $V0
 57 TEST $CLI_1 volume delete $V0
 58 
 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0
 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1
 61 
 62 TEST $CLI_1 peer probe $H3;
 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 64 
 65 TEST $CLI_1 volume start $V0
 66 TEST $CLI_1 volume start $V1
 67 
 68 #bug-1173414 - validate mgmt-v3-remote-lock-failure
 69 
 70 for i in {1..20}
 71 do
 72 $CLI_1 volume set $V0 diagnostics.client-log-level DEBUG &
 73 $CLI_1 volume set $V1 barrier on
 74 $CLI_2 volume set $V0 diagnostics.client-log-level DEBUG &
 75 $CLI_2 volume set $V1 barrier on
 76 done
 77 
 78 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 79 TEST $CLI_1 volume status
 80 TEST $CLI_2 volume status
 81 
 82 #bug-1293414 - validate peer detach
 83 
 84 # peers hosting bricks cannot be detached
 85 TEST ! $CLI_2 peer detach $H1
 86 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1
 87 
 88 # peer not hosting bricks should be detachable
 89 TEST $CLI_2 peer detach $H3
 90 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 1
 91 
 92 #bug-1344407 - deleting a volume when peer is down should fail
 93 
 94 TEST kill_glusterd 2
 95 TEST ! $CLI_1 volume delete $V0
 96 
 97 cleanup

At line number 59, we have 2 nodes in cluster. and we are executing below
commands.
 59 TEST $CLI_1 volume create $V0 $H1:$B1/$V0
 60 TEST $CLI_1 volume create $V1 $H1:$B1/$V1
 61 
 62 TEST $CLI_1 peer probe $H3;
 63 EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count 1

I executed the same steps on my setup:
[root at server1 glusterfs]# gluster pe stat
Number of Peers: 0
[root at server1 glusterfs]# gluster pe probe server2
peer probe: success. 
[root at server1 glusterfs]# gluster v create test-vol1 server1:/tmp/b11
volume create: test-vol1: success: please start the volume to access data
[root at server1 glusterfs]# gluster v create test-vol2 server1:/tmp/b12
volume create: test-vol2: success: please start the volume to access data
[root at server1 glusterfs]# gluster pe probe server3
peer probe: success. 
[root at server1 glusterfs] #

Now, checking output of "gluster v info" and "gluster pe stat" from all 3 nodes
in cluster.
>From node1:
[root at server1 glusterfs]# gluster v info

Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on

Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
[root at server1 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server2
Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3
State: Peer in Cluster (Connected)
[root at server1 glusterfs]#

>From node2:
[root at server2 glusterfs]# gluster v info

Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Brick1 VG: 
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on

Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Xlator 1: BD
Capability 1: thin
Capability 2: offload_copy
Capability 3: offload_snapshot
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Brick1 VG: 
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
[root at server2 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server1
Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: 6c68eb99-c9f4-4590-a022-7ef2081705b3
State: Peer Rejected (Connected)
[root at server2 glusterfs]# 

>From node3:
[root at server3 glusterfs]# gluster v info

Volume Name: test-vol1
Type: Distribute
Volume ID: be908175-34bf-4376-b28e-23f142457c67
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b11
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.brick-multiplex: on

Volume Name: test-vol2
Type: Distribute
Volume ID: 882c245c-d435-4a23-98f6-399a7caedec0
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/tmp/b12
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
cluster.brick-multiplex: on
[root at server3 glusterfs]# gluster pe stat
Number of Peers: 2

Hostname: server1
Uuid: 8a75c6c4-865d-4805-bbdf-403234e9b5e3
State: Peer in Cluster (Connected)

Hostname: server2
Uuid: 917311b8-0b5c-4f22-aa1d-b1216ab192e5
State: Peer Rejected (Connected)
[root at server3 glusterfs]# 

We are seeing BD xlator related things in output of "gluster v info" in node2
because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820

As we issued peer probe to node3 from node1, the data from node1 is synced to
node3. So both are in connected state.

Because of https://bugzilla.redhat.com/show_bug.cgi?id=1635820 node2 is having
caps=15 in its info file of the volumes. When node3 is performing handshake
with node2, since node2 is having a caps filed in info file, node2 will go into
rejected state. So, when we issue peer status from node2/node3, we see
node3/node2 are in rejected state.

Now, at line #85, we are issueing
TEST ! $CLI_2 peer detach $H1

The above command is success since, when we issue "gluster pe detach node1"
from node2, it will fail saying "peer detach: failed: One of the peers is
probably down. Check with 'peer status'" and we have not(!) before the command.
We are seeing the above error because at node2's peer status node3 is in
rejected state. We expect above command to fail saying "Brick(s) with the peer
node1 exist in cluster".

-> Now, why this test case is failing with
https://review.gluster.org/#/c/glusterfs/+/21336/
https://review.gluster.org/#/c/glusterfs/+/21336/ addresses
https://bugzilla.redhat.com/show_bug.cgi?id=1635820. So all the nodes will be
connected state after executing commands from line #59 to #63. We expect peer
detach at line #85 to fail saying "Brick(s) with the peer node1 exist in
cluster". but the peer detach is success, so the test case is failing.

-> Here's why the peer detach is success:
Patch https://review.gluster.org/#/c/glusterfs/+/19135/ has
optimised glusterd test cases by clubbing the similar test
cases into a single test case.

https://review.gluster.org/#/c/glusterfs/+/19135/15/tests/bugs/glusterd/bug-1293414-import-brickinfo-uuid.t
test case has been deleted and added as a part of
tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t

In the original test case, we create a volume with two bricks,
each on a separate node(node1 & node2). From another node in cluster(node3),
we try to detach a node which is hosting bricks. It fails.

In the new test, we created volume with single brick on node1.
and from another node in cluster, we tried to detach node1. we
expect peer detach to fail, but peer detach was success as
the node is hosting all the bricks of volume.

To fix this issue, we have to change the test case to reflect the original test
case scenario.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=J16QakVXwH&a=cc_unsubscribe