[Gluster-users] [Gluster-devel] Random and frequent split brain

Sat Jul 19 17:36:08 UTC 2014

On Sat, Jul 19, 2014 at 08:23:29AM +0530, Pranith Kumar Karampuri wrote:
> Guys,
>      Does anyone know why device-id can be different even though it
> is all single xfs filesystem?
> We see the following log in the brick-log.
> 
> [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]
> 0-home-posix: mismatching ino/dev between file

The device-id (major:minor number) of a block-device can change, but 
will not change while the device is in use. Device-mapper (DM) is part 
of the stack that includes multipath and lvm (and more, but these are 
most common). The stack for the block-devices is built dynamically, and 
the device-id is assigned when the block-device is made active. The 
ordering of making devices active can change, hence the device-id too.  
It is also possible to deactivate some logical-volumes, and activate 
them in a different order. (You can not deactivate a dm-device when it 
is in use, for example mounted.)

Without device-mapper in the io-stack, re-ordering disks is possible 
too, but requires a little more (advanced sysadmin) work.

So, the main questions I'd ask would be:
1. What kind of block storage is used, LVM, multipath, ...?
2. Were there any issues on the block-layer, scsi-errors, reconnects?
3. Were there changes in the underlaying disks or their structure? Disks 
   added, removed or new partitions created.
4. Were disks deactivated+activated again, for example for creating 
   backups or snapshots on a level below the (XFS) filesystem?

HTH,
Niels

> /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> (1077282838/2431) and handle
> /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0
> (1077282836/2431)
> [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:
> setting gfid on
> /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> failed
> 
> 
> Pranith
> On 07/17/2014 07:06 PM, Nilesh Govindrajan wrote:
> >log1 was the log from client of node2. The filesystems are mounted
> >locally. /data is a raid10 array and /data/gluster contains 4 volumes,
> >one of which is home which is a high read/write one (the log of which
> >was attached here).
> >
> >On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri
> ><pkarampu at redhat.com> wrote:
> >>On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote:
> >>>log1 and log2 are brick logs. The others are client logs.
> >>I see a lot of logs as below in 'log1' you attached. It seems like the
> >>device ID of where the file where it is actually stored, where the gfid-link
> >>of the same file is stored i.e inside <brick-dir>/.glusterfs/ are different.
> >>What all devices/filesystems are present inside the brick represented by
> >>'log1'?
> >>
> >>[2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]
> >>0-home-posix: mismatching ino/dev between file
> >>/data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> >>(1077282838/2431) and handle
> >>/data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0
> >>(1077282836/2431)
> >>[2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:
> >>setting gfid on
> >>/data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> >>failed
> >>
> >>Pranith
> >>
> >>
> >>>On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri
> >>><pkarampu at redhat.com> wrote:
> >>>>On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote:
> >>>>>On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan <me at nileshgr.com>
> >>>>>wrote:
> >>>>>>Hello,
> >>>>>>
> >>>>>>I'm having a weird issue. I have this config:
> >>>>>>
> >>>>>>node2 ~ # gluster peer status
> >>>>>>Number of Peers: 1
> >>>>>>
> >>>>>>Hostname: sto1
> >>>>>>Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
> >>>>>>State: Peer in Cluster (Connected)
> >>>>>>
> >>>>>>node1 ~ # gluster peer status
> >>>>>>Number of Peers: 1
> >>>>>>
> >>>>>>Hostname: sto2
> >>>>>>Port: 24007
> >>>>>>Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
> >>>>>>State: Peer in Cluster (Connected)
> >>>>>>
> >>>>>>Volume Name: home
> >>>>>>Type: Replicate
> >>>>>>Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
> >>>>>>Status: Started
> >>>>>>Number of Bricks: 1 x 2 = 2
> >>>>>>Transport-type: tcp
> >>>>>>Bricks:
> >>>>>>Brick1: sto1:/data/gluster/home
> >>>>>>Brick2: sto2:/data/gluster/home
> >>>>>>Options Reconfigured:
> >>>>>>performance.write-behind-window-size: 2GB
> >>>>>>performance.flush-behind: on
> >>>>>>performance.cache-size: 2GB
> >>>>>>cluster.choose-local: on
> >>>>>>storage.linux-aio: on
> >>>>>>transport.keepalive: on
> >>>>>>performance.quick-read: on
> >>>>>>performance.io-cache: on
> >>>>>>performance.stat-prefetch: on
> >>>>>>performance.read-ahead: on
> >>>>>>cluster.data-self-heal-algorithm: diff
> >>>>>>nfs.disable: on
> >>>>>>
> >>>>>>sto1/2 is alias to node1/2 respectively.
> >>>>>>
> >>>>>>As you see, NFS is disabled so I'm using the native fuse mount on both
> >>>>>>nodes.
> >>>>>>The volume contains files and php scripts that are served on various
> >>>>>>websites. When both nodes are active, I get split brain on many files
> >>>>>>and the mount on node2 going 'input/output error' on many of them
> >>>>>>which causes HTTP 500 errors.
> >>>>>>
> >>>>>>I delete the files from the brick using find -samefile. It fixes for a
> >>>>>>few minutes and then the problem is back.
> >>>>>>
> >>>>>>What could be the issue? This happens even if I use the NFS mounting
> >>>>>>method.
> >>>>>>
> >>>>>>Gluster 3.4.4 on Gentoo.
> >>>>>And yes, network connectivity is not an issue between them as both of
> >>>>>them are located in the same DC. They're connected via 1 Gbit line
> >>>>>(common for internal and external network) but external network
> >>>>>doesn't cross 200-500 Mbit/s leaving quite a good window for gluster.
> >>>>>I also tried enabling quorum but that doesn't help either.
> >>>>>_______________________________________________
> >>>>>Gluster-users mailing list
> >>>>>Gluster-users at gluster.org
> >>>>>http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>hi Nilesh,
> >>>>        Could you attach the mount, brick logs so that we can inspect what
> >>>>is
> >>>>going on the setup.
> >>>>
> >>>>Pranith
> >>
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel