[Bugs] [Bug 1760467] New: rebalance start is succeeding when quorum is not met
bugzilla at redhat.com
bugzilla at redhat.com
Thu Oct 10 15:20:25 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1760467
Bug ID: 1760467
Summary: rebalance start is succeeding when quorum is not met
Product: GlusterFS
Version: mainline
Hardware: x86_64
Status: NEW
Component: glusterd
Keywords: Regression
Severity: high
Assignee: bugs at gluster.org
Reporter: srakonde at redhat.com
CC: amukherj at redhat.com, bmekala at redhat.com,
bugs at gluster.org, rhs-bugs at redhat.com,
sheggodu at redhat.com, storage-qa-internal at redhat.com,
vbellur at redhat.com
Depends On: 1760261
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1760261 +++
Description of problem:
On a three node cluster with quorum enabled on a replicated volume. Performed
add-brick, stopped glusterd on one node then started rebalance on the volume.
gluster vol rebalance testvol start
volume rebalance: testvol: success: Rebalance on testvol has been started
successfully. Use rebalance status command to check status of the rebalance
process.
ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234
Rebalance start is succeeding when quorum is not met.
Version-Release number of selected component (if applicable):
glusterfs-server-6.0-15.el7rhgs.x86_64
How reproducible:
2/2
Steps to Reproduce:
1.On a three node cluster, create a 1X3 replicate volume
2. Set "cluster.server-quorum-type" as server and set the ratio to 90.
3. Performed add-brick(3 bricks)
4. stopped glusterd on one node.
5. perform rebalance start
Actual results:
gluster vol rebalance testvol start
volume rebalance: testvol: success: Rebalance on testvol has been started
successfully. Use rebalance status command to check status of the rebalance
process.
ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234
rebalance start is successful when quorum not met
Expected results:
rebalance start should not succeed when quorum not met
Additional info:
#### gluster vol info
[root at dhcp35-11 ~]# gluster vol info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: c9822762-7dac-47bd-8645-9cfee3d02b00
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.11:/bricks/brick4/testvol
Brick2: 10.70.35.7:/bricks/brick4/testvol
Brick3: 10.70.35.73:/bricks/brick4/testvol
Brick4: 10.70.35.73:/bricks/brick4/ht
Brick5: 10.70.35.11:/bricks/brick4/ht
Brick6: 10.70.35.7:/bricks/brick4/ht
Options Reconfigured:
cluster.server-quorum-type: server
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 90
#### gluster vol status
gluster vol status
Status of volume: testvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.35.11:/bricks/brick4/testvol 49152 0 Y 11039
Brick 10.70.35.7:/bricks/brick4/testvol 49152 0 Y 27266
Brick 10.70.35.73:/bricks/brick4/testvol 49152 0 Y 10746
Brick 10.70.35.73:/bricks/brick4/ht 49153 0 Y 11028
Brick 10.70.35.11:/bricks/brick4/ht 49153 0 Y 11338
Brick 10.70.35.7:/bricks/brick4/ht 49153 0 Y 27551
Self-heal Daemon on localhost N/A N/A Y 11363
Self-heal Daemon on 10.70.35.73 N/A N/A Y 11053
Self-heal Daemon on dhcp35-7.lab.eng.blr.re
dhat.com N/A N/A Y 27577
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
#### After stopping glusterd on one node volume status
[root at dhcp35-11 ~]# gluster vol status
Status of volume: testvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.35.11:/bricks/brick4/testvol N/A N/A N N/A
Brick 10.70.35.7:/bricks/brick4/testvol N/A N/A N N/A
Brick 10.70.35.11:/bricks/brick4/ht N/A N/A N N/A
Brick 10.70.35.7:/bricks/brick4/ht N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 11363
Self-heal Daemon on dhcp35-7.lab.eng.blr.re
dhat.com N/A N/A Y 27577
Task Status of Volume testvol
------------------------------------------------------------------------------
There are no active volume tasks
gluster vol rebalance testvol start
volume rebalance: testvol: success: Rebalance on testvol has been started
successfully. Use rebalance status command to check status of the rebalance
process.
ID: 86cfc8b1-1e24-4244-b8e0-6941f4684234
[root at dhcp35-11 ~]# gluster vol rebalance testvol status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
dhcp35-7.lab.eng.blr.redhat.com 0 0Bytes
0 0 0 failed 0:00:00
localhost 0 0Bytes
0 0 0 failed 0:00:00
volume rebalance: testvol: success
### glusterd log after stopping glusterd on one of the node
[2019-10-10 09:19:00.361314] I [MSGID: 106004]
[glusterd-handler.c:6521:__glusterd_peer_rpc_notify] 0-management: Peer
<10.70.35.73> (<53117ee2-5182-42c6-8c74-26f43b075a0c>), in state <Peer in
Cluster>, has disconnected from glusterd.
[2019-10-10 09:19:00.361553] W [glusterd-locks.c:807:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24f6a) [0x7fe6a4b4df6a]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x2f790) [0x7fe6a4b58790]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0xf3883) [0x7fe6a4c1c883]
) 0-management: Lock for vol testvol not held
[2019-10-10 09:19:00.361570] W [MSGID: 106117]
[glusterd-handler.c:6542:__glusterd_peer_rpc_notify] 0-management: Lock not
released for testvol
[2019-10-10 09:19:00.361607] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] 0-management:
Server quorum lost for volume testvol. Stopping local bricks.
[2019-10-10 09:19:00.361825] I [MSGID: 106542]
[glusterd-utils.c:8775:glusterd_brick_signal] 0-glusterd: sending signal 15 to
brick with pid 11039
[2019-10-10 09:19:01.362068] I [socket.c:871:__socket_shutdown] 0-management:
intentional socket shutdown(16)
[2019-10-10 09:19:01.362680] I [MSGID: 106542]
[glusterd-utils.c:8775:glusterd_brick_signal] 0-glusterd: sending signal 15 to
brick with pid 11338
[2019-10-10 09:19:02.362982] I [socket.c:871:__socket_shutdown] 0-management:
intentional socket shutdown(20)
[2019-10-10 09:19:02.363239] I [MSGID: 106143]
[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
/bricks/brick4/testvol on port 49152
[2019-10-10 09:19:02.368590] I [MSGID: 106143]
[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
/bricks/brick4/ht on port 49153
[2019-10-10 09:19:02.375567] I [MSGID: 106499]
[glusterd-handler.c:4502:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume testvol
[2019-10-10 09:19:25.717254] I [MSGID: 106539]
[glusterd-utils.c:12461:glusterd_generate_and_set_task_id] 0-management:
Generated task-id 86cfc8b1-1e24-4244-b8e0-6941f4684234 for key rebalance-id
[2019-10-10 09:19:30.751060] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2019-10-10 09:19:30.751284] E [MSGID: 106061]
[glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd:
failed to get index from rsp dict
[2019-10-10 09:19:35.761694] E [MSGID: 106061]
[glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd:
failed to get index from rsp dict
[2019-10-10 09:19:35.767505] I [MSGID: 106172]
[glusterd-handshake.c:1085:__server_event_notify] 0-glusterd: received defrag
status updated
[2019-10-10 09:19:35.773243] I [MSGID: 106007]
[glusterd-rebalance.c:153:__glusterd_defrag_notify] 0-management: Rebalance
process for volume testvol has disconnected.
[2019-10-10 09:19:39.436119] E [MSGID: 106061]
[glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd:
failed to get index from rsp dict
[2019-10-10 09:19:39.436978] E [MSGID: 106061]
[glusterd-utils.c:11159:glusterd_volume_rebalance_use_rsp_dict] 0-glusterd:
failed to get index from rsp dict
[2019-10-10 09:31:36.682991] I [MSGID: 106488]
[glusterd-handler.c:1564:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req
[2019-10-10 09:31:36.684006] I [MSGID: 106488]
[glusterd-handler.c:1564:__glusterd_handle_cli_get_volume] 0-management:
Received get vol req
--- Additional comment from RHEL Product and Program Management on 2019-10-10
15:06:22 IST ---
This bug is automatically being proposed for the next minor release of Red Hat
Gluster Storage by setting the release flag 'rhgs‑3.5.0' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from Bala Konda Reddy M on 2019-10-10 15:16:01 IST ---
Setup is in same state for further debugging.
Ip: 10.70.35.11
credentials: root/1
Regards,
Bala
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1760261
[Bug 1760261] rebalance start is succeeding when quorum is not met
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list