[Gluster-users] Split Brain?

Thu Oct 21 15:43:11 UTC 2010

Count Zero <countz at ...> writes:

> 
> Hi Craig,
> 
> The md5sum of the files was indeed the same, which is why I thought it was 
very peculiar.
> 
> Basically, all my servers are clients & bricks, in an AFR configuration. The 
goal is to replicate the data as
> much as possible so it's available locally on all machines.
> 
> However - I went through the configuration files on all the machines, and it 
seems I've neglected to remove
> some of the bricks from the volumes, so there was a situation where some of 
the clients kept trying to save on
> bricks they were no longer supposed to be a part of.
> 
> I have now ensured proper separation and configuration, and I am no longer 
getting the split brain messages
> in the log, at least for now.
> 
> - I'm using the Gluster version that comes with Ubuntu 10.04 Server, which is 
version 3.0.2. It's exactly
> the same OS version & kernel, and Gluster version, on all of my machines.
> - This is my kernel: Linux some.machine.com 2.6.32-24-server #38-Ubuntu SMP 
Mon Jul 5 10:29:32 UTC 2010
> x86_64 GNU/Linux
> - The on-disk FS is ext4.
> - I use glusterfs via a mount line in /etc/fstab
> 
> What I have yet to figure out, is how to get the files to be read at "local" 
speed. I mean, seeing as the files are
> all physically in /data/export/ on ALL my servers, is there something I can do 
to make the reads really
> fast? The writes are very rare, so I really don't care much about writes.
> 
> Thanks!
> 
> On Aug 4, 2010, at 6:04 AM, Craig Carl wrote:
> 
> > CountZ -
> > Some questions to help us troubleshoot -
> > 
> > What version of Gluster?
> > What is your server /version/kernel?
> > What is the on-disk filesystem?
> > Can you md5sum the file on all the storage servers? Is it the same 
everywhere?
> > 
> > How are you exporting Gluster (Gluster FS, NFS, CIFS, FTP, etc?)
> > What version of Gluster on the clients, if using it?
> > What are your clients disti/version/kernel?
> > If you md5sum the file from a couple of clients are the results the same 
across the clients and the servers?
> > 
> > Thanks for your help.
> > 
> > Craig
> > 408-829-9953
> > Gluster Inc. 
> > 
> > 
> > Sent from a mobile device, please excuse my tpyos.
> > 
> > On Aug 3, 2010, at 20:32, Count Zero <countz at ...> wrote:
> > 
> >> It's local and is a standard AFR setup. I believe all the files are 
actually the same, but I'll verify this
> again. It just does this for a LOT of files, and they are all the same files 
(nothing has changed really).
> >> 
> >> About WAN: I have mostly given up on WAN replication at the moment, so I 
use glusterfs for local groups of
> machines that are on the same switch, and I use a separate solution to sync 
between WAN glusters.
> >> 
> >> So how do I delete without erasing the file from the entire gluster?
> >> 
> >> I'm assuming I need to:
> >> 
> >> 1) Unmount all the clients
> >> 2) Erase and recreate /data/export on all nodes other than the chosen 
"master"
> >> 3) Remount the clients, and access the files
> >> 
> >> Is that right?
> >> 
> >> 
> >> On Aug 4, 2010, at 4:14 AM, Tejas N. Bhise wrote:
> >> 
> >>> Is this over the WAN replicated setup ? Or a local setup ?
> >>> 
> >>> ----- Original Message -----
> >>> From: "Count Zero" <countz at ...>
> >>> To: "Gluster General Discussion List" <gluster-users at ...>
> >>> Sent: Wednesday, August 4, 2010 8:38:02 AM
> >>> Subject: [Gluster-users] Split Brain?
> >>> 
> >>> I am seeing a lot of those in my cluster client's log file:
> >>> 
> >>> [2010-08-04 04:06:30] E [afr-self-heal-data.c:705:afr_sh_data_fix] 
replicate: Unable to
> self-heal contents of '/lib/wms-server.jar' (possible split-brain). Please 
delete the file from all
> but the preferred subvolume.
> >>> 
> >>> How do I recover from this without losing my files?
> >>> 
> >>> Thanks,
> >>> CountZ
> >>> 
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at ...
> >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at ...
> >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >> 
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at ...
> >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at ...
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 

I have just found out the similar behavior on one of my glusterfs clients.
Accessing (cat /patrh_to/file) a file (I was able to ls it) was throwing a i/o 
error:

cat: /path_to/file : Input/output error

I also noticed that the time stamp on this file on the problematick client was 
different compared to the one on the other clients.

The glusterfs log file was full with the error messages like this:

... mirror-0: Unable to self-heal contents of '/patrh_to/file' (possible split-
brain). Please delete the file from all but the preferred subvolume.

 I have two bricks in a mirror. All my clients are running Ubuntu 10.04 32bit 
with glusterfs compiled from source (3.0.5-1). My bricks are running Ubuntu 
10.04 64bit with glusterfs installed from glusterfs_3.0.5-1_amd64.deb. 
Umount/mount the volume on the server in question fixes the problem but since my 
glusterfs client are production app servers, getting to the bottom of this issue 
is very important for us.

Best regards,
Vlad