[Bugs] [Bug 1703343] Bricks fail to come online after node reboot on a scaled setup
bugzilla at redhat.com
bugzilla at redhat.com
Fri Apr 26 07:48:16 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1703343
--- Comment #1 from Mohit Agrawal <moagrawa at redhat.com> ---
Multiple bricks are spawned on a node if the node is reboot during
volumes starting from another node in the cluster
Reproducer steps
1) Setup a cluster of 3 nodes
2) Enable brick_mux and create and start 50 volumes from node 1
3) Stop all the volumes from any node
4) Start all the volumes from node 2 after put 1 sec delay
for i in {1..50}; do gluster v start testvol$i --mode=script; sleep 1; done
5) At the time of volumes are starting on node 2 run command on node 1
pkill -f gluster; glusterd
6) Wait some time to finish volumes startups and check the no. of glusterfsd
are running on node1.
RCA: At the time of glusterd starts it gets friend update request from a peer
node and has version changes for the volumes those are started when
node was down.glusterd deletes volfile and reference for old version
volumes
from glusterd internal data structures and create new volfile.glusterd was
not
able to attached volume because data structure changes were happening
after brick
start so data was going through RPC packet in attach request was not
correct and
brick process sending disconnect to glusterd then glusterd try to spawn a
new
brick so multiple brick processes are spawned
Regards,
Mohit Agrawal
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Bugs
mailing list