[Gluster-users] Random and frequent split brain

Nilesh Govindrajan me at nileshgr.com
Thu Jul 17 03:11:26 UTC 2014


log1 and log2 are brick logs. The others are client logs.

On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri
<pkarampu at redhat.com> wrote:
>
> On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote:
>>
>> On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan <me at nileshgr.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I'm having a weird issue. I have this config:
>>>
>>> node2 ~ # gluster peer status
>>> Number of Peers: 1
>>>
>>> Hostname: sto1
>>> Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
>>> State: Peer in Cluster (Connected)
>>>
>>> node1 ~ # gluster peer status
>>> Number of Peers: 1
>>>
>>> Hostname: sto2
>>> Port: 24007
>>> Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
>>> State: Peer in Cluster (Connected)
>>>
>>> Volume Name: home
>>> Type: Replicate
>>> Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: sto1:/data/gluster/home
>>> Brick2: sto2:/data/gluster/home
>>> Options Reconfigured:
>>> performance.write-behind-window-size: 2GB
>>> performance.flush-behind: on
>>> performance.cache-size: 2GB
>>> cluster.choose-local: on
>>> storage.linux-aio: on
>>> transport.keepalive: on
>>> performance.quick-read: on
>>> performance.io-cache: on
>>> performance.stat-prefetch: on
>>> performance.read-ahead: on
>>> cluster.data-self-heal-algorithm: diff
>>> nfs.disable: on
>>>
>>> sto1/2 is alias to node1/2 respectively.
>>>
>>> As you see, NFS is disabled so I'm using the native fuse mount on both
>>> nodes.
>>> The volume contains files and php scripts that are served on various
>>> websites. When both nodes are active, I get split brain on many files
>>> and the mount on node2 going 'input/output error' on many of them
>>> which causes HTTP 500 errors.
>>>
>>> I delete the files from the brick using find -samefile. It fixes for a
>>> few minutes and then the problem is back.
>>>
>>> What could be the issue? This happens even if I use the NFS mounting
>>> method.
>>>
>>> Gluster 3.4.4 on Gentoo.
>>
>> And yes, network connectivity is not an issue between them as both of
>> them are located in the same DC. They're connected via 1 Gbit line
>> (common for internal and external network) but external network
>> doesn't cross 200-500 Mbit/s leaving quite a good window for gluster.
>> I also tried enabling quorum but that doesn't help either.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
> hi Nilesh,
>       Could you attach the mount, brick logs so that we can inspect what is
> going on the setup.
>
> Pranith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log1.xz
Type: application/x-xz
Size: 634120 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140717/f2508d69/attachment.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log2
Type: application/octet-stream
Size: 305784 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140717/f2508d69/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1client.log.gz
Type: application/x-gzip
Size: 253387 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140717/f2508d69/attachment.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2client.log.gz
Type: application/x-gzip
Size: 388098 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140717/f2508d69/attachment-0001.gz>


More information about the Gluster-users mailing list