[Gluster-users] [IMPORTANT, PLEASE READ] replace-brick problem with all releases till now
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Oct 1 09:24:14 UTC 2015
hi,
In releases till now from day-1 with replication, there is a
corner case bug which can wipe out the all the bricks in that replica
set when the disk/brick(s) are replaced.
Here are the steps that could lead to that situation:
0) Clients are operating on the volume and are actively pumping data.
1) Execute replace-brick command (OR) take down the brick that needs the
disk replaced or re-formatted. and bring the brick back up.
2) Client creates a file/directory just on the root of the brick which
succeeds on the new brick but fails on the bricks that have been online
(May be because the file existed on the bricks that are good copies)
3) Now when self-heal is triggered from the Client/self-heal-daemon it
thinks the just replaced brick is the correct directory and deletes the
files/directories from the bricks that have the actual data.
I have been working on afr for almost 4 years now and never saw any user
complain about this problem. We were working on a document for an
official way to replace brick/disk but it never occurred to us that
this could happen until recently. I am going to get a proper document
by end of this week on replacing the bricks/disks in a safe way. And
will keep you guys posted about fixes to prevent this from happening
entirely.
Pranith
More information about the Gluster-users
mailing list