[Gluster-users] Gluster - replica - Unable to self-heal contents of '/' (possible split-brain)

Mon Dec 9 13:51:31 UTC 2013

Hello,

I'm trying to build a replica volume, on two servers.

The servers are:  blade6 and blade7.  (another blade1 in the peer, but with
no volumes)

The volume seems ok, but I cannot mount it from NFS.

Here are some logs:

[root at blade6 stor1]# df -h

/dev/mapper/gluster_stor1  882G  200M  837G   1% /gluster/stor1

[root at blade7 stor1]# df -h

/dev/mapper/gluster_fast   846G  158G  646G  20% /gluster/stor_fast

/dev/mapper/gluster_stor1  882G   72M  837G   1% /gluster/stor1

[root at blade6 stor1]# pwd

/gluster/stor1

[root at blade6 stor1]# ls -lh

total 0

[root at blade7 stor1]# pwd

/gluster/stor1

[root at blade7 stor1]# ls -lh

total 0

[root at blade6 stor1]# gluster volume info

Volume Name: stor_fast

Type: Distribute

Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31

Status: Started

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor_fast

Options Reconfigured:

nfs.port: 2049

Volume Name: stor1

Type: Replicate

Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor1

Brick2: blade6.xen:/gluster/stor1

Options Reconfigured:

nfs.port: 2049

[root at blade7 stor1]# gluster volume info

Volume Name: stor_fast

Type: Distribute

Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31

Status: Started

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor_fast

Options Reconfigured:

nfs.port: 2049

Volume Name: stor1

Type: Replicate

Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor1

Brick2: blade6.xen:/gluster/stor1

Options Reconfigured:

nfs.port: 2049

[root at blade6 stor1]# gluster volume status

Status of volume: stor_fast

Gluster process                                         Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor_fast          49152   Y       1742

NFS Server on localhost                                 2049    Y
20074

NFS Server on blade1.xen                     2049    Y       22255

NFS Server on blade7.xen                     2049    Y       7574

There are no active volume tasks

Status of volume: stor1

Gluster process                                         Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor1              49154   Y       7562

Brick blade6.xen:/gluster/stor1              49154   Y       20053

NFS Server on localhost                                 2049    Y
20074

Self-heal Daemon on localhost                           N/A     Y
20079

NFS Server on blade1.xen                     2049    Y       22255

Self-heal Daemon on blade1.xen               N/A     Y       22260

NFS Server on blade7.xen                     2049    Y       7574

Self-heal Daemon on blade7.xen               N/A     Y       7578

There are no active volume tasks

[root at blade7 stor1]# gluster volume status

Status of volume: stor_fast

Gluster process                                         Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor_fast            49152   Y       1742

NFS Server on localhost                              2049    Y       7574

NFS Server on blade6.xen                               2049    Y       20074

NFS Server on blade1.xen                       2049    Y       22255

There are no active volume tasks

Status of volume: stor1

Gluster process                                         Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor1              49154   Y       7562

Brick blade6.xen:/gluster/stor1              49154   Y       20053

NFS Server on localhost                                 2049    Y       7574

Self-heal Daemon on localhost                           N/A     Y       7578

NFS Server on blade1.xen                     2049    Y       22255

Self-heal Daemon on blade1.xen               N/A     Y       22260

NFS Server on blade6.xen                              2049    Y       20074

Self-heal Daemon on blade6.xen                        N/A     Y       20079

There are no active volume tasks

[root at blade6 stor1]# gluster peer status

Number of Peers: 2

Hostname: blade1.xen

Port: 24007

Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b

State: Peer in Cluster (Connected)

Hostname: blade7.xen

Port: 24007

Uuid: 574eb256-30d2-4639-803e-73d905835139

State: Peer in Cluster (Connected)

[root at blade7 stor1]# gluster peer status

Number of Peers: 2

Hostname: blade6.xen

Port: 24007

Uuid: a65cadad-ef79-4821-be41-5649fb204f3e

State: Peer in Cluster (Connected)

Hostname: blade1.xen

Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b

State: Peer in Cluster (Connected)

[root at blade6 stor1]# gluster volume heal stor1 info

Gathering Heal info on volume stor1 has been successful

Brick blade7.xen:/gluster/stor1

Number of entries: 0

Brick blade6.xen:/gluster/stor1

Number of entries: 0

[root at blade7 stor1]# gluster volume heal stor1 info

Gathering Heal info on volume stor1 has been successful

Brick blade7.xen:/gluster/stor1

Number of entries: 0

Brick blade6.xen:/gluster/stor1

Number of entries: 0

When I'm trying to mount the volume with NFS, I have the following errors:

[2013-12-09 13:20:52.066978] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred subvolume.-
Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]

[2013-12-09 13:20:52.067386] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background  meta-data self-heal failed on /

[2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error

[2013-12-09 13:20:53.092039] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred subvolume.-
Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]

[2013-12-09 13:20:53.092497] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background  meta-data self-heal failed on /

[2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error

What I'm doing wrong ?

PS:  Volume stor_fast works like a charm.

Best Regards,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/b0b21677/attachment.html>