[Gluster-users] [IMPORTANT, PLEASE READ] replace-brick problem with all releases till now
Steve Dainard
sdainard at spd1.com
Sat Oct 3 00:01:42 UTC 2015
On Thu, Oct 1, 2015 at 2:24 AM, Pranith Kumar Karampuri
<pkarampu at redhat.com> wrote:
> hi,
> In releases till now from day-1 with replication, there is a corner
> case bug which can wipe out the all the bricks in that replica set when the
> disk/brick(s) are replaced.
>
> Here are the steps that could lead to that situation:
> 0) Clients are operating on the volume and are actively pumping data.
> 1) Execute replace-brick command (OR) take down the brick that needs the
> disk replaced or re-formatted. and bring the brick back up.
So the better course of action would be to remove-brick <vol> replica
<n-1> start, replace the disk, and then add-brick <vol> replica <n+1>
? Perhaps it would be wise to un-peer the host before adding the brick
back?
Is there any chance that adding a 3rd replica to a 2 replica cluster
with active client writes could cause the same issue?
On 3.7.3 I recently lost 2 of 3 bricks all the way down to the XFS
filesystem being corrupted, but I blamed that on the disk controller
which was doing a raid0 pass-through on 2 hosts, but not the new 3rd
host. This occurred after some time though, and client writes were
being blocked while the 3rd brick was being added.
> 2) Client creates a file/directory just on the root of the brick which
> succeeds on the new brick but fails on the bricks that have been online
> (May be because the file existed on the bricks that are good copies)
> 3) Now when self-heal is triggered from the Client/self-heal-daemon it
> thinks the just replaced brick is the correct directory and deletes the
> files/directories from the bricks that have the actual data.
>
> I have been working on afr for almost 4 years now and never saw any user
> complain about this problem. We were working on a document for an official
> way to replace brick/disk but it never occurred to us that this could
> happen until recently. I am going to get a proper document by end of this
> week on replacing the bricks/disks in a safe way. And will keep you guys
> posted about fixes to prevent this from happening entirely.
>
> Pranith
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list