[Gluster-devel] all-null pending matrix again

Anand Avati avati at gluster.org
Mon Sep 30 16:43:17 UTC 2013


On Sun, Sep 29, 2013 at 11:27 PM, Amar Tumballi <atumball at redhat.com> wrote:

> On 09/30/2013 09:25 AM, Emmanuel Dreyfus wrote:
>
>> Hi
>>
>> I tested 3.4.1 on a long run ang got a spurious split brain with all-null
>> pending matrix again:
>>
>> [2013-09-30 00:25:12.962127] E
>> [afr-self-heal-common.c:197:**afr_sh_print_split_brain_log]
>> 0-gfs341-replicate-1: Unable to self-heal contents of
>> '/manu/netbsd/usr/src/lib/**libpuffs/obj/dispatcher.pico' (possible
>> split-brain). Please delete the file from all but the preferred
>> subvolume.-
>> Pending matrix:  [ [ 0 0 ] [ 0 0 ] ]
>>
>> The bug is described with log files here:
>> https://bugzilla.redhat.com/**show_bug.cgi?id=1005526<https://bugzilla.redhat.com/show_bug.cgi?id=1005526>
>>
>> We previously thought that it was caused by heterogeneous setup (i386 +
>> amd64), but it is not the case in my latest test. The only known
>> workaround so
>> far is to disable eager locks.
>>
>> The split brain is real, as shown below. What I do not really understand
>> is
>> why I have 4 copies of the file, as this is a 2x2 stripped-replicated
>> volume.
>> The first set of replicas are fine, the second set has two file filled
>> with
>> zeros, one wih correct size, the other being truncated.
>>
>> silo:/export/wd2a/manu/netbsd/**usr/src/lib/libpuffs/obj/**
>> dispatcher.pico
>> -rw-r--r--  2 manu  manu  11916 Sep 30 02:23 dispatcher.pico
>> SHA1(dispatcher.pico) = b512c2924194ab9d001aec402ef037**225f9a6e6d
>>
>> hangar:/export/wd1a/manu/**netbsd/usr/src/lib/libpuffs/**
>> obj/dispatcher.pico
>> -rw-r--r--  2 manu  manu  11916 Sep 30 02:23 dispatcher.pico
>> SHA1(dispatcher.pico) = b512c2924194ab9d001aec402ef037**225f9a6e6d
>>
>> hangar:/export/wd3a/manu/**netbsd/usr/src/lib/libpuffs/**
>> obj/dispatcher.pico
>> -rw-r--r--  2 manu  manu  11916 Sep 30 02:23 dispatcher.pico
>> SHA1(dispatcher.pico) = 3bdf1048eff02260594f7385449dcc**37bd09b78f
>> NB: filled with zeroes
>>
>> debacle:/export/wd1a/manu/**netbsd/usr/src/lib/libpuffs/**
>> obj/dispatcher.pico
>> -rw-r--r--  2 manu  manu  11049 Sep 30 02:23 dispatcher.pico
>> SHA1(dispatcher.pico) = 5357efe9fa5299b59a279b32ce8047**67c6ffc116
>> NB: filled with zeroes
>>
>> Is it possible to have the same file on different stripes in other
>> situations
>> than during a rename operation?
>>
>
> I guess the solution to these issues is in the works :
> http://review.gluster.org/6010
>
> Would advise to run the first qa bits from 3.5.0 branch :-)
>

This seems to be a striped-replicated configuration. I'm not sure what the
exact issue is. It may not be NetBSD specific either, because we have note
tested striped-replicated configuration in the way described by Emmanuel.

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130930/ae63b291/attachment-0001.html>


More information about the Gluster-devel mailing list