[Gluster-users] ... i was able to produce a split brain...

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Feb 4 06:58:17 UTC 2015


On 02/03/2015 10:42 PM, Ted Miller wrote:
>
> On 1/31/2015 12:47 AM, Pranith Kumar Karampuri wrote:
>>
>> On 01/30/2015 06:28 PM, Jeff Darcy wrote:
>>>> Pranith and I had a discussion regarding this issue and here is 
>>>> what we have
>>>> in our mind right now.
>>>>
>>>> We plan to provide the user commands to execute from mount so that 
>>>> he can
>>>> access the files in split-brain. This way he can choose which copy 
>>>> is to be
>>>> used as source. The user will have to perform a set of getfattrs and
>>>> setfattrs (on virtual xattrs) to decide which child to choose as 
>>>> source and
>>>> inform AFR with his decision.
>>>>
>>>> A) To know the split-brain status :
>>>> getfattr -n trusted.afr.split-brain-status <path-to-file>
>>>>
>>>> This will provide user with the following details -
>>>> 1) Whether the file is in metadata split-brain
>>>> 2) Whether the file is in data split-brain
>>>>
>>>> It will also list the name of afr-children to choose from. 
>>>> Something like :
>>>> Option0: client-0
>>>> Option1: client-1
>>>>
>>>> We also tell the user what the user could do to view metadata/data 
>>>> info; like
>>>> stat to get metadata etc.
>>>>
>>>> B) Now the user has to choose one of the options 
>>>> (client-x/client-y..) to
>>>> inspect the files.
>>>> e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 
>>>> <path-to-file>
>>>> We save the read-child info in inode-ctx in order to provide the 
>>>> user access
>>>> to the file in split-brain from that child. Once the user inspects 
>>>> the file,
>>>> he proceeds to do the same from the other child of replica pair and 
>>>> makes an
>>>> informed decision.
>>>>
>>>> C) Once the above steps are done, AFR is to be informed with the 
>>>> final choice
>>>> for source. This is achieved by -
>>>> (say the fresh copy is in client-0)
>>>> e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0
>>>> <path-to-file>
>>>> This child will be chosen as source and split-brain resolution will 
>>>> be done.
> May I suggest another possible way to get around the difficulty in 
> determining which of the files is the one to keep?
>
> What if each of the files were to be renamed by appending the name of 
> the brick-host that it lives on?
> For example, in a replica 2 system:
> brick-1: data1
> host-1: host1
> brick-2: date1
> host-2: host2
> file name: hair-pulling.txt
>
> after running script/command to resolve split-brain, file system would 
> have two files:
> hair-pulling.txt__host-1_data1
> hair-pulling.txt__host-2_data1
This doesn't seem so bad either. I will need to give it more thought to 
see if there are any problems.
>
> the user would then delete the unwanted file and rename the wanted 
> file back to hair-pulling.txt.
>
> The only problem would come with a very large file with a large number 
> of replicas (like the replica 5 system I am working with). You might 
> run out of space for all the copies.
>
> Otherwise, this seems (to me) to present a user-friendly way to do 
> this.  If the user has doubts (and disk space), user can choose to 
> keep the rejected file around for a while, "just in case" it happens 
> to have something useful in it that is missing from the "accepted" file.
>
> ****************************************************************
> That brought another thought to mind (have not had reason to try it):
> How does gluster cope if you go behind its back and rename a 
> "rejected" file?  For instance, in my example above, what if I go 
> directly on the brick and rename the host-2 copy of the file to 
> hair-pulling.txt-dud?  The ideal scenario would seem to be that if 
> user does a heal it would treat the copy as new file, see no dupe for 
> hair-pulling.txt, and create a new dupe on host-2.  Since 
> hair-pulling.txt-dud is also a new file, a dupe would be created on 
> host-1.  User could then access files, verify correctness, and then 
> delete hair-pulling.txt-dud.
>
> *****************************************************************
This one won't work because of the reason Joe gave about gfid-hardlinks.

> A not-officially-sanctioned way that I dealt with a split-brain a few 
> versions back:
> 1. decided I wanted to keep file on host-2
> 2. log onto host-2
> 3. cp /brick/data1/hair-pulling.txt /gluster/data1/hair-pulling.txt-dud
> 4. rm /brick/data1/hair-pulling.txt
> 5. follow some Joe Julian blog stuff to delete the "invisible fork" of 
> file
> 6. gluster volume heal data1 all
> I believe that this did work for me at that time.  I have not had to 
> do it on a recent gluster version.
This would work. You can check the document written by Ravi for this in 
the official tree: 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md

Pranith
>
> Ted Miller
> Elkhart, IN
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list