[Gluster-users] Split brain; which file to choose for repair?

Anand Babu Periasamy ab at gluster.com
Wed May 4 14:32:05 UTC 2011


On Wed, May 4, 2011 at 7:29 PM, Joe Landman <landman at scalableinformatics.com
> wrote:

> On 05/04/2011 08:24 AM, Martin Schenker wrote:
>
>> Hi all!
>>
>> Is there anybody who can give some pointers regarding which file to choose
>> in a "split brain" condition?
>>
>> What tests do I need to run?
>>
>
> MD5sums.  Did the logs indicate a split brain?  Or are the signatures
> simply different?
>
>
>
>> What does the hex AFR code actually show? Is there a way to pinpoint the
>> "better/worse" file for deletion?
>>
>> On pserver12:
>>
>> # file: mnt/gluster/brick0/storage/pserver3-19
>> trusted.afr.storage0-client-5=0x3f0000010000000000000000
>>
>> On pserver13:
>>
>> # file: mnt/gluster/brick0/storage/pserver3-19
>> trusted.afr.storage0-client-4=0xd70000010000000000000000
>>
>> These are test files, but I'd like to know what to do in a LIFE situation
>> which will be just around the corner.
>>
>> The Timestamps show the same values, so I'm a bit puzzled HOW to choose a
>> file.
>>
>
> File sizes and time stamps the same?
>
> Hmmm ... this sounds like an underlying caching issue (probably not flushed
> completely/properly on one or more of the units before reboot) with the base
> machine.  Check the battery backup  on the RAID and make sure it is
> functional.
>
> Also, run an file system check on the underlying backend storage.
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics, Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>       http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>


Yes I agree with Joe. I will check the underlying disk filesystem first as
first step. Look at kernel logs and dmesg for file system errors. Even if
you did not find any, try running a forceful fsck on them. Another possible
cause is silent data corruption. If everything is fine, then it can likely
be a GlusterFS bug

-- 
Anand Babu Periasamy
Blog [http://www.unlocksmith.org]

Imagination is more important than knowledge --Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110504/8c123a78/attachment.html>


More information about the Gluster-users mailing list