[Gluster-users] ... i was able to produce a split brain...
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Feb 4 06:58:17 UTC 2015
On 02/03/2015 10:42 PM, Ted Miller wrote:
>
> On 1/31/2015 12:47 AM, Pranith Kumar Karampuri wrote:
>>
>> On 01/30/2015 06:28 PM, Jeff Darcy wrote:
>>>> Pranith and I had a discussion regarding this issue and here is
>>>> what we have
>>>> in our mind right now.
>>>>
>>>> We plan to provide the user commands to execute from mount so that
>>>> he can
>>>> access the files in split-brain. This way he can choose which copy
>>>> is to be
>>>> used as source. The user will have to perform a set of getfattrs and
>>>> setfattrs (on virtual xattrs) to decide which child to choose as
>>>> source and
>>>> inform AFR with his decision.
>>>>
>>>> A) To know the split-brain status :
>>>> getfattr -n trusted.afr.split-brain-status <path-to-file>
>>>>
>>>> This will provide user with the following details -
>>>> 1) Whether the file is in metadata split-brain
>>>> 2) Whether the file is in data split-brain
>>>>
>>>> It will also list the name of afr-children to choose from.
>>>> Something like :
>>>> Option0: client-0
>>>> Option1: client-1
>>>>
>>>> We also tell the user what the user could do to view metadata/data
>>>> info; like
>>>> stat to get metadata etc.
>>>>
>>>> B) Now the user has to choose one of the options
>>>> (client-x/client-y..) to
>>>> inspect the files.
>>>> e.g., setfattr -n trusted.afr.split-brain-choice -v client-0
>>>> <path-to-file>
>>>> We save the read-child info in inode-ctx in order to provide the
>>>> user access
>>>> to the file in split-brain from that child. Once the user inspects
>>>> the file,
>>>> he proceeds to do the same from the other child of replica pair and
>>>> makes an
>>>> informed decision.
>>>>
>>>> C) Once the above steps are done, AFR is to be informed with the
>>>> final choice
>>>> for source. This is achieved by -
>>>> (say the fresh copy is in client-0)
>>>> e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0
>>>> <path-to-file>
>>>> This child will be chosen as source and split-brain resolution will
>>>> be done.
> May I suggest another possible way to get around the difficulty in
> determining which of the files is the one to keep?
>
> What if each of the files were to be renamed by appending the name of
> the brick-host that it lives on?
> For example, in a replica 2 system:
> brick-1: data1
> host-1: host1
> brick-2: date1
> host-2: host2
> file name: hair-pulling.txt
>
> after running script/command to resolve split-brain, file system would
> have two files:
> hair-pulling.txt__host-1_data1
> hair-pulling.txt__host-2_data1
This doesn't seem so bad either. I will need to give it more thought to
see if there are any problems.
>
> the user would then delete the unwanted file and rename the wanted
> file back to hair-pulling.txt.
>
> The only problem would come with a very large file with a large number
> of replicas (like the replica 5 system I am working with). You might
> run out of space for all the copies.
>
> Otherwise, this seems (to me) to present a user-friendly way to do
> this. If the user has doubts (and disk space), user can choose to
> keep the rejected file around for a while, "just in case" it happens
> to have something useful in it that is missing from the "accepted" file.
>
> ****************************************************************
> That brought another thought to mind (have not had reason to try it):
> How does gluster cope if you go behind its back and rename a
> "rejected" file? For instance, in my example above, what if I go
> directly on the brick and rename the host-2 copy of the file to
> hair-pulling.txt-dud? The ideal scenario would seem to be that if
> user does a heal it would treat the copy as new file, see no dupe for
> hair-pulling.txt, and create a new dupe on host-2. Since
> hair-pulling.txt-dud is also a new file, a dupe would be created on
> host-1. User could then access files, verify correctness, and then
> delete hair-pulling.txt-dud.
>
> *****************************************************************
This one won't work because of the reason Joe gave about gfid-hardlinks.
> A not-officially-sanctioned way that I dealt with a split-brain a few
> versions back:
> 1. decided I wanted to keep file on host-2
> 2. log onto host-2
> 3. cp /brick/data1/hair-pulling.txt /gluster/data1/hair-pulling.txt-dud
> 4. rm /brick/data1/hair-pulling.txt
> 5. follow some Joe Julian blog stuff to delete the "invisible fork" of
> file
> 6. gluster volume heal data1 all
> I believe that this did work for me at that time. I have not had to
> do it on a recent gluster version.
This would work. You can check the document written by Ravi for this in
the official tree:
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
Pranith
>
> Ted Miller
> Elkhart, IN
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list