[Bugs] [Bug 1728766] New: Volume start failed when shd is down in one of the node in cluster
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jul 10 15:57:22 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1728766
Bug ID: 1728766
Summary: Volume start failed when shd is down in one of the
node in cluster
Product: GlusterFS
Version: mainline
OS: Linux
Status: NEW
Component: glusterd
Severity: urgent
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
CC: amukherj at redhat.com, anepatel at redhat.com,
bmekala at redhat.com, bugs at gluster.org,
nchilaka at redhat.com, rhs-bugs at redhat.com,
rkavunga at redhat.com, sankarshan at redhat.com,
srakonde at redhat.com, storage-qa-internal at redhat.com,
vbellur at redhat.com, vdas at redhat.com
Depends On: 1726219
Blocks: 1696809
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1726219 +++
Description of problem:
Volume info o/p is not consistent across the cluster, output from two nodes
says volume is in stopped state, whereas one node says volume is in start
state.
Node1:
[root at dhcp35-50 ~]# gluster v info test3
Volume Name: test3
Type: Replicate
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.50:/bricks/brick1/tes3
Brick2: 10.70.46.216:/bricks/brick1/tes3
Brick3: 10.70.46.132:/bricks/brick1/tes3
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
[root at dhcp35-50 ~]# gluster v status test3
Staging failed on 10.70.46.216. Error: Volume test3 is not started
Staging failed on 10.70.46.132. Error: Volume test3 is not started
Node 2:
[root at dhcp46-216 ~]# gluster v info test3 | egrep 'Volume ID|Status'
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Stopped
Node3:
[root at dhcp46-132 ~]# gluster v info test3 | egrep 'Volume ID|Status'
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Stopped
==================================================
Version-Release number of selected component (if applicable):
How reproducible:
2/2
Steps to Reproduce:
1. Create 2 replica 3 vols
2. Stop 1 volume, execute command on node 1 (35.50)
[root at dhcp35-50 ~]# gluster v stop test3
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: test3: success
3. Kill shd on one node
kill -15 5928
4. Check #gluster v info from all 3 nodes
Volume is in stopped state, as seen from o/p of all three nodes
5. Now start volume from node 1
# gluster v start test3
volume start: test3: failed: Commit failed on localhost. Please check log file
for details.
O/p says volume start failed.
6. Now check vol info o/p on all three nodes
Node1:
[root at dhcp35-50 ~]# gluster v info test3 | egrep 'Volume ID|Status'
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Started
Node2:
[root at dhcp46-216 ~]# gluster v info test3 | egrep 'Volume ID|Status'
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Stopped
Node3:
[root at dhcp46-132 ~]# gluster v info test3 | egrep 'Volume ID|Status'
Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983
Status: Stopped
Actual results:
As described above in Steps to reproduce
Expected results:
1. Volume should start without any error (confirmed that volume starts in older
release (glusterfs-fuse-3.12.2-47.2.el7rhgs.x86_64)
2. Command o/p should be consistent when executed from any nodes, (As all
automation cases randomly take any node as master for command execution)
3. Volume start force should bring up shd on a node where it was killed
(confirmed on older release glusterfs-fuse-3.12.2-47.2.el7rhgs.x86_64)
Additional info:
Also there is discrepancy in output of vol status when executed from different
nodes.
[root at dhcp35-50 ~]# gluster v status test3
Staging failed on 10.70.46.132. Error: Volume test3 is not started
Staging failed on 10.70.46.216. Error: Volume test3 is not started
[root at dhcp46-132 ~]# gluster v status test3
Volume test3 is not started
[root at dhcp46-216 ~]# gluster v status test3
Volume test3 is not started
[root at dhcp46-216 ~]# gluster v start test3 force
volume start: test3: failed: Commit failed on dhcp35-50.lab.eng.blr.redhat.com.
Please check log file for details.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1726219
[Bug 1726219] Volume start failed when shd is down in one of the node in
cluster
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list