[Gluster-users] Random and frequent split brain

Nilesh Govindrajan me at nileshgr.com
Thu Jul 17 13:36:27 UTC 2014


log1 was the log from client of node2. The filesystems are mounted
locally. /data is a raid10 array and /data/gluster contains 4 volumes,
one of which is home which is a high read/write one (the log of which
was attached here).

On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri
<pkarampu at redhat.com> wrote:
>
> On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote:
>>
>> log1 and log2 are brick logs. The others are client logs.
>
> I see a lot of logs as below in 'log1' you attached. It seems like the
> device ID of where the file where it is actually stored, where the gfid-link
> of the same file is stored i.e inside <brick-dir>/.glusterfs/ are different.
> What all devices/filesystems are present inside the brick represented by
> 'log1'?
>
> [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard]
> 0-home-posix: mismatching ino/dev between file
> /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> (1077282838/2431) and handle
> /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0
> (1077282836/2431)
> [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix:
> setting gfid on
> /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old
> failed
>
> Pranith
>
>
>>
>> On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri
>> <pkarampu at redhat.com> wrote:
>>>
>>> On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote:
>>>>
>>>> On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan <me at nileshgr.com>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm having a weird issue. I have this config:
>>>>>
>>>>> node2 ~ # gluster peer status
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: sto1
>>>>> Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> node1 ~ # gluster peer status
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: sto2
>>>>> Port: 24007
>>>>> Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> Volume Name: home
>>>>> Type: Replicate
>>>>> Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: sto1:/data/gluster/home
>>>>> Brick2: sto2:/data/gluster/home
>>>>> Options Reconfigured:
>>>>> performance.write-behind-window-size: 2GB
>>>>> performance.flush-behind: on
>>>>> performance.cache-size: 2GB
>>>>> cluster.choose-local: on
>>>>> storage.linux-aio: on
>>>>> transport.keepalive: on
>>>>> performance.quick-read: on
>>>>> performance.io-cache: on
>>>>> performance.stat-prefetch: on
>>>>> performance.read-ahead: on
>>>>> cluster.data-self-heal-algorithm: diff
>>>>> nfs.disable: on
>>>>>
>>>>> sto1/2 is alias to node1/2 respectively.
>>>>>
>>>>> As you see, NFS is disabled so I'm using the native fuse mount on both
>>>>> nodes.
>>>>> The volume contains files and php scripts that are served on various
>>>>> websites. When both nodes are active, I get split brain on many files
>>>>> and the mount on node2 going 'input/output error' on many of them
>>>>> which causes HTTP 500 errors.
>>>>>
>>>>> I delete the files from the brick using find -samefile. It fixes for a
>>>>> few minutes and then the problem is back.
>>>>>
>>>>> What could be the issue? This happens even if I use the NFS mounting
>>>>> method.
>>>>>
>>>>> Gluster 3.4.4 on Gentoo.
>>>>
>>>> And yes, network connectivity is not an issue between them as both of
>>>> them are located in the same DC. They're connected via 1 Gbit line
>>>> (common for internal and external network) but external network
>>>> doesn't cross 200-500 Mbit/s leaving quite a good window for gluster.
>>>> I also tried enabling quorum but that doesn't help either.
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>> hi Nilesh,
>>>        Could you attach the mount, brick logs so that we can inspect what
>>> is
>>> going on the setup.
>>>
>>> Pranith
>
>



More information about the Gluster-users mailing list