[Gluster-users] Gluster does not seem to detect a split-brain situation

Joe Julian joe at julianfamily.org
Sun Jun 7 20:09:21 UTC 2015


(oops... I hate when I reply off-list)

That warning should, imho, be an error. That's saying that the handle, 
which should be a hardlink to the file, doesn't have a matching inode. 
It should if it's a hardlink.

If it were me, I would:

     find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs 
/bin/rm

This would clean up any handles that are not hardlinked where they 
should be and will allow gluster to repair them.

Btw, the self-heal errors would be in glustershd.log and/or the client 
mount log(s), not (usually) the brick logs.

On 06/07/2015 12:21 PM, Sjors Gielen wrote:
> Oops! Accidentally ran the command as non-root on Curacao, that's why 
> there was no output. The actual output is:
>
> curacao# getfattr -m . -d -e hex 
> /export/sdb1/data/Case/21000355/studies.dat
> getfattr: Removing leading '/' from absolute path names
> # file: export/sdb1/data/Case/21000355/studies.dat
> trusted.afr.data-client-0=0x000000000000000000000000
> trusted.afr.data-client-1=0x000000000000000000000000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>
> For reference, the output on bonaire:
>
> bonaire# getfattr -m . -d -e hex 
> /export/sdb1/data/Case/21000355/studies.dat
> getfattr: Removing leading '/' from absolute path names
> # file: export/sdb1/data/Case/21000355/studies.dat
> trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>
> Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors at sjorsgielen.nl 
> <mailto:sjors at sjorsgielen.nl>>:
>
>     I'm reading about quorums, I haven't set up anything like that yet.
>
>     (In reply to Joe Julian, who responded off-list)
>
>     The output of getfattr on bonaire:
>
>     bonaire# getfattr -m . -d -e hex
>     /export/sdb1/data/Case/21000355/studies.dat
>     getfattr: Removing leading '/' from absolute path names
>     # file: export/sdb1/data/Case/21000355/studies.dat
>     trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
>
>     On curacao, the command gives no output.
>
>     From `gluster volume status`, it seems that while the "brick
>     curacao:/export/sdb1/data" is online, it has no associated port
>     number. Curacao can connect to the port number provided by Bonaire
>     just fine. There are no firewalls on/between the two machines,
>     they are on the same subnet connected by Ethernet cables and two
>     switches.
>
>     By the way, warning messages just started appearing to
>     /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying
>     "mismatching ino/dev between file X and handle Y", though, maybe
>     only just now even though I started the full self-heal hours ago.
>
>     [2015-06-07 19:10:39.624393] W
>     [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching
>     ino/dev between file
>     /export/sdb1/data/Archive/S21/21008971/studies.dat
>     (9127104621/2065) and handle
>     /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd
>     (9190215976/2065)
>
>     Thanks again!
>     Sjors
>
>     Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen
>     <sjors at sjorsgielen.nl <mailto:sjors at sjorsgielen.nl>>:
>
>         Hi all,
>
>         I work at a small, 8-person company that uses Gluster for its
>         primary data storage. We have a volume called "data" that is
>         replicated over two servers (details below). This worked
>         perfectly for over a year, but lately we've been noticing some
>         mismatches between the two bricks, so it seems there has been
>         some split-brain situation that is not being detected or
>         resolved. I have two questions about this:
>
>         1) I expected Gluster to (eventually) detect a situation like
>         this; why doesn't it?
>         2) How do I fix this situation? I've tried an explicit 'heal',
>         but that didn't seem to change anything.
>
>         Thanks a lot for your help!
>         Sjors
>
>         ------8<------
>
>         Volume & peer info: http://pastebin.com/PN7tRXdU
>         curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat
>         7bc2daec6be953ffae920d81fe6fa25c
>         /export/sdb1/data/Case/21000355/studies.dat
>         bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat
>         28c950a1e2a5f33c53a725bf8cd72681
>         /export/sdb1/data/Case/21000355/studies.dat
>
>         # mallorca is one of the clients
>         mallorca# md5sum /data/Case/21000355/studies.dat
>         7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat
>
>         I expected an input/output error after reading this file,
>         because of the split-brain situation, but got none. There are
>         no entries in the GlusterFS logs of either bonaire or curacao.
>
>         bonaire# gluster volume heal data full
>         Launching heal operation to perform full self heal on volume
>         data has been successful
>         Use heal info commands to check status
>         bonaire# gluster volume heal data info
>         Brick bonaire:/export/sdb1/data/
>         Number of entries: 0
>
>         Brick curacao:/export/sdb1/data/
>         Number of entries: 0
>
>         (Same output on curacao, and hours after this, the md5sums on
>         both bricks still differ.)
>
>         curacao# gluster --version
>         glusterfs 3.6.2 built on Mar  2 2015 14:05:34
>         Repository revision: git://git.gluster.com/glusterfs.git
>         <http://git.gluster.com/glusterfs.git>
>         (Same version on Bonaire)
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/9aa0a4e1/attachment.html>


More information about the Gluster-users mailing list