[Gluster-users] Gluster does not seem to detect a split-brain situation

Sjors Gielen sjors at sjorsgielen.nl
Sun Jun 7 19:13:11 UTC 2015


I'm reading about quorums, I haven't set up anything like that yet.

(In reply to Joe Julian, who responded off-list)

The output of getfattr on bonaire:

bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

On curacao, the command gives no output.

>From `gluster volume status`, it seems that while the "brick
curacao:/export/sdb1/data" is online, it has no associated port number.
Curacao can connect to the port number provided by Bonaire just fine. There
are no firewalls on/between the two machines, they are on the same subnet
connected by Ethernet cables and two switches.

By the way, warning messages just started appearing to
/var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying
"mismatching ino/dev between file X and handle Y", though, maybe only just
now even though I started the full self-heal hours ago.

[2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard]
0-data-posix: mismatching ino/dev between file
/export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and
handle
/export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd
(9190215976/2065)

Thanks again!
Sjors

Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors at sjorsgielen.nl>:

> Hi all,
>
> I work at a small, 8-person company that uses Gluster for its primary data
> storage. We have a volume called "data" that is replicated over two servers
> (details below). This worked perfectly for over a year, but lately we've
> been noticing some mismatches between the two bricks, so it seems there has
> been some split-brain situation that is not being detected or resolved. I
> have two questions about this:
>
> 1) I expected Gluster to (eventually) detect a situation like this; why
> doesn't it?
> 2) How do I fix this situation? I've tried an explicit 'heal', but that
> didn't seem to change anything.
>
> Thanks a lot for your help!
> Sjors
>
> ------8<------
>
> Volume & peer info: http://pastebin.com/PN7tRXdU
> curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat
> 7bc2daec6be953ffae920d81fe6fa25c
> /export/sdb1/data/Case/21000355/studies.dat
> bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat
> 28c950a1e2a5f33c53a725bf8cd72681
> /export/sdb1/data/Case/21000355/studies.dat
>
> # mallorca is one of the clients
> mallorca# md5sum /data/Case/21000355/studies.dat
> 7bc2daec6be953ffae920d81fe6fa25c  /data/Case/21000355/studies.dat
>
> I expected an input/output error after reading this file, because of the
> split-brain situation, but got none. There are no entries in the GlusterFS
> logs of either bonaire or curacao.
>
> bonaire# gluster volume heal data full
> Launching heal operation to perform full self heal on volume data has been
> successful
> Use heal info commands to check status
> bonaire# gluster volume heal data info
> Brick bonaire:/export/sdb1/data/
> Number of entries: 0
>
> Brick curacao:/export/sdb1/data/
> Number of entries: 0
>
> (Same output on curacao, and hours after this, the md5sums on both bricks
> still differ.)
>
> curacao# gluster --version
> glusterfs 3.6.2 built on Mar  2 2015 14:05:34
> Repository revision: git://git.gluster.com/glusterfs.git
> (Same version on Bonaire)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150607/d43178a0/attachment.html>


More information about the Gluster-users mailing list