[Gluster-users] Gluster does not seem to detect a split-brain situation

Sjors Gielen sjors at sjorsgielen.nl
Mon Jun 8 09:07:29 UTC 2015


Ah, that's really weird. I'm pretty sure that nothing ever made write
changes to /export on either machine, so I wonder how the hard links ended
up being split. I'll indeed clean up the .glusterfs directory and keep
close tabs on Gluster's repair.

Glustershd.log and the client mount logs (data.log and gluster.log at
least) on the client are empty and nothing appears when I read the
mismatching studies.dat file.

Thanks for your help!
Sjors

Op zo 7 jun. 2015 om 22:10 schreef Joe Julian <joe at julianfamily.org>:

>  (oops... I hate when I reply off-list)
>
> That warning should, imho, be an error. That's saying that the handle,
> which should be a hardlink to the file, doesn't have a matching inode. It
> should if it's a hardlink.
>
> If it were me, I would:
>
>     find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs
> /bin/rm
>
> This would clean up any handles that are not hardlinked where they should
> be and will allow gluster to repair them.
>
> Btw, the self-heal errors would be in glustershd.log and/or the client
> mount log(s), not (usually) the brick logs.
>
>
> On 06/07/2015 12:21 PM, Sjors Gielen wrote:
>
> Oops! Accidentally ran the command as non-root on Curacao, that's why
> there was no output. The actual output is:
>
>  curacao# getfattr -m . -d -e hex
> /export/sdb1/data/Case/21000355/studies.dat
> getfattr: Removing leading '/' from absolute path names
> # file: export/sdb1/data/Case/21000355/studies.dat
> trusted.afr.data-client-0=0x000000000000000000000000
> trusted.afr.data-client-1=0x000000000000000000000000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>
>  For reference, the output on bonaire:
>
>  bonaire# getfattr -m . -d -e hex
> /export/sdb1/data/Case/21000355/studies.dat
> getfattr: Removing leading '/' from absolute path names
> # file: export/sdb1/data/Case/21000355/studies.dat
> trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>
>  Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>:
>
>>  I'm reading about quorums, I haven't set up anything like that yet.
>>
>>  (In reply to Joe Julian, who responded off-list)
>>
>>  The output of getfattr on bonaire:
>>
>>  bonaire# getfattr -m . -d -e hex
>> /export/sdb1/data/Case/21000355/studies.dat
>> getfattr: Removing leading '/' from absolute path names
>> # file: export/sdb1/data/Case/21000355/studies.dat
>> trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>>
>>  On curacao, the command gives no output.
>>
>>  From `gluster volume status`, it seems that while the "brick
>> curacao:/export/sdb1/data" is online, it has no associated port number.
>> Curacao can connect to the port number provided by Bonaire just fine. There
>> are no firewalls on/between the two machines, they are on the same subnet
>> connected by Ethernet cables and two switches.
>>
>>  By the way, warning messages just started appearing to
>> /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying
>> "mismatching ino/dev between file X and handle Y", though, maybe only just
>> now even though I started the full self-heal hours ago.
>>
>>  [2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard]
>> 0-data-posix: mismatching ino/dev between file
>> /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and
>> handle
>> /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd
>> (9190215976/2065)
>>
>>  Thanks again!
>> Sjors
>>
>>  Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>:
>>
>>> Hi all,
>>>
>>>  I work at a small, 8-person company that uses Gluster for its primary
>>> data storage. We have a volume called "data" that is replicated over two
>>> servers (details below). This worked perfectly for over a year, but lately
>>> we've been noticing some mismatches between the two bricks, so it seems
>>> there has been some split-brain situation that is not being detected or
>>> resolved. I have two questions about this:
>>>
>>>  1) I expected Gluster to (eventually) detect a situation like this;
>>> why doesn't it?
>>> 2) How do I fix this situation? I've tried an explicit 'heal', but that
>>> didn't seem to change anything.
>>>
>>>  Thanks a lot for your help!
>>> Sjors
>>>
>>>  ------8<------
>>>
>>>  Volume & peer info: http://pastebin.com/PN7tRXdU
>>> curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat
>>> 7bc2daec6be953ffae920d81fe6fa25c
>>> /export/sdb1/data/Case/21000355/studies.dat
>>>  bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat
>>> 28c950a1e2a5f33c53a725bf8cd72681
>>> /export/sdb1/data/Case/21000355/studies.dat
>>>
>>>  # mallorca is one of the clients
>>> mallorca# md5sum /data/Case/21000355/studies.dat
>>> 7bc2daec6be953ffae920d81fe6fa25c  /data/Case/21000355/studies.dat
>>>
>>>  I expected an input/output error after reading this file, because of
>>> the split-brain situation, but got none. There are no entries in the
>>> GlusterFS logs of either bonaire or curacao.
>>>
>>>  bonaire# gluster volume heal data full
>>> Launching heal operation to perform full self heal on volume data has
>>> been successful
>>> Use heal info commands to check status
>>> bonaire# gluster volume heal data info
>>> Brick bonaire:/export/sdb1/data/
>>> Number of entries: 0
>>>
>>>  Brick curacao:/export/sdb1/data/
>>> Number of entries: 0
>>>
>>>  (Same output on curacao, and hours after this, the md5sums on both
>>> bricks still differ.)
>>>
>>>  curacao# gluster --version
>>> glusterfs 3.6.2 built on Mar  2 2015 14:05:34
>>> Repository revision: git://git.gluster.com/glusterfs.git
>>> (Same version on Bonaire)
>>>
>>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>
>  _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/88e599bc/attachment.html>


More information about the Gluster-users mailing list