[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error

Anh Vo vtqanh at gmail.com
Wed Jul 4 15:45:52 UTC 2018


If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get
the following:

Lookup failed on /:Invalid argument.
Volume heal failed.

node2 was not connected at that time, because if we connect it to the
system after a few minutes gluster will become almost unusable and we have
many jobs failing. This morning I reconnected it and ran heal info and we
have about 30000 entries to heal (15K from gfs-vm000 and 15k from
gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us
to check the individual gfid so we kinda rely on gluster self heal to
handle those gfid. The "/" is a concern because it prevents us from
mounting nfs. We do need to mount nfs for some of our management because
gluster fuse mount is much slower compared to nfs when it comes to
recursive operations like 'du'

Do you have any suggestion for healing the metadata on '/' ?

Thanks
Anh

On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N <ravishankar at redhat.com>
wrote:

> Hi,
>
> What version of gluster are you using?
>
> 1. The afr xattrs on '/' indicate a meta-data split-brain. You can resolve
> it using one of the policies listed in https://docs.gluster.org/en/
> latest/Troubleshooting/resolving-splitbrain/
>
> For example, "gluster volume heal gv0 split-brain latest-mtime / "
> 2. Is the file corresponding to the other gfid (81289110-867b-42ff-ba3b-1373a187032b)
> present in all bricks? What do the getfattr outputs for this file indicate?
>
> 3. As for the discrepancy in output of heal info, is node2 connected to
> the other nodes? Does heal info still print the details of all 3 bricks
> when you run it on node2 ?
> -Ravi
>
>
> On 07/04/2018 01:47 AM, Anh Vo wrote:
>
> Actually we just discovered that the heal info command was returning
> different things when executed on the different nodes of our 3-replica
> setup.
> When we execute it on node2 we did not see the split brain reported "/"
> but if I execute it on node0 and node1 I am seeing:
>
> x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
> Brick gfs-vm000:/gluster/brick/brick0
> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
> / - Is in split-brain
>
> Status: Connected
> Number of entries: 2
>
> Brick gfs-vm001:/gluster/brick/brick0
> / - Is in split-brain
>
> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
> Status: Connected
> Number of entries: 2
>
> Brick gfs-vm002:/gluster/brick/brick0
> / - Is in split-brain
>
> Status: Connected
> Number of entries: 1
>
>
> I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and
> I am seeing node2 has slightly different attr:
> node0:
> sudo getfattr -d -m . -e hex /gluster/brick/brick0
> getfattr: Removing leading '/' from absolute path names
> # file: gluster/brick/brick0
> trusted.afr.gv0-client-2=0x000000000000000100000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>
> node1:
> sudo getfattr -d -m . -e hex /gluster/brick/brick0
> getfattr: Removing leading '/' from absolute path names
> # file: gluster/brick/brick0
> trusted.afr.gv0-client-2=0x000000000000000100000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>
> node2:
> sudo getfattr -d -m . -e hex /gluster/brick/brick0
> getfattr: Removing leading '/' from absolute path names
> # file: gluster/brick/brick0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gv0-client-0=0x000000000000000200000000
> trusted.afr.gv0-client-1=0x000000000000000200000000
> trusted.afr.gv0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>
> Where do I go from here? Thanks
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/179fa334/attachment.html>


More information about the Gluster-users mailing list