[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

Pete Smith pete at realisestudio.com
Wed Apr 10 15:41:57 UTC 2013


Hi Dan

I've come up against this recently whilst trying to delete large amounts of
files from our cluster.

I'm resolving it with the method from
http://comments.gmane.org/gmane.comp.file-systems.gluster.user/1917

With Fabric as a helping hand, it's not too tedious.

Not sure about the level of glustershd compatibiity, but it's working for
me.

HTH

Pete
-- 


On 10 April 2013 11:44, Daniel Mons <daemons at kanuka.com.au> wrote:

> Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
> with 100TB usable for staff.  This is one of 4 identical GlusterFS
> clusters we're running.
>
> Very early in the life of our production Gluster rollout, we ran
> Netatalk 2.X to share files with MacOSX clients (due to slow negative
> lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
> Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
> caused enormous IO, locking up many nodes at a time (lots of "hung
> task" errors in dmesg/syslog).
>
> We've since moved to Netatalk 3.X which puts its CNID_DB files
> elsewhere (we put them on local SSD RAID), and the lockups have
> vanished.  However, our split-brain files number in the tens of
> thousands to to those previous lockups, and aren't always predictable
> (i.e.: it's not always the case where brick0 is "good" and brick1 is
> "bad").  Manually fixing the files is far too time consuming.
>
> I've written a rudimentary script that trawls
> /var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
> down on the matching pair of bricks, and figures out via a few rules
> (size tends to be a good indicator for us, as bigger files tend to be
> more rencent ones) which is the "good" file.  This works for about 80%
> of files, which will dramatically reduce the amount of data we have to
> manually check.
>
> My question is: what should I do from here?  Options are:
>
> Option 1) Delete the file from the "bad" brick
>
> Option 2)  rsync the file from the "good" brick to the "bad" brick
> with -aX flag (preserve everything, including trusted.afr.$server and
> trusted.gfid xattrs)
>
> Option 3) rsync the file from "good" to "bad", and then setfattr -x
> trusted.* on the bad brick.
>
> Which of these is considered the better (more glustershd compatible)
> option?  Or alternatively, is there something else that's preferred?
>
> Normally I'd just test this on our backup gluster, however as it was
> never running Netatalk, it has no split-brain problems, so I can't
> test the functionality.
>
> Thanks for any insight provided,
>
> -Dan
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pete Smith
DevOp/System Administrator
Realise Studio
12/13 Poland Street, London W1F 8QB
T. +44 (0)20 7165 9644

realisestudio.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130410/28a0bcd0/attachment.html>


More information about the Gluster-users mailing list