[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
Anh Vo
vtqanh at gmail.com
Wed Jul 4 15:50:08 UTC 2018
I forgot to mention we're using 3.12.10
On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo <vtqanh at gmail.com> wrote:
> If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get
> the following:
>
> Lookup failed on /:Invalid argument.
> Volume heal failed.
>
> node2 was not connected at that time, because if we connect it to the
> system after a few minutes gluster will become almost unusable and we have
> many jobs failing. This morning I reconnected it and ran heal info and we
> have about 30000 entries to heal (15K from gfs-vm000 and 15k from
> gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us
> to check the individual gfid so we kinda rely on gluster self heal to
> handle those gfid. The "/" is a concern because it prevents us from
> mounting nfs. We do need to mount nfs for some of our management because
> gluster fuse mount is much slower compared to nfs when it comes to
> recursive operations like 'du'
>
> Do you have any suggestion for healing the metadata on '/' ?
>
> Thanks
> Anh
>
> On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> Hi,
>>
>> What version of gluster are you using?
>>
>> 1. The afr xattrs on '/' indicate a meta-data split-brain. You can
>> resolve it using one of the policies listed in
>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>>
>> For example, "gluster volume heal gv0 split-brain latest-mtime / "
>> 2. Is the file corresponding to the other gfid
>> (81289110-867b-42ff-ba3b-1373a187032b) present in all bricks? What do
>> the getfattr outputs for this file indicate?
>>
>> 3. As for the discrepancy in output of heal info, is node2 connected to
>> the other nodes? Does heal info still print the details of all 3 bricks
>> when you run it on node2 ?
>> -Ravi
>>
>>
>> On 07/04/2018 01:47 AM, Anh Vo wrote:
>>
>> Actually we just discovered that the heal info command was returning
>> different things when executed on the different nodes of our 3-replica
>> setup.
>> When we execute it on node2 we did not see the split brain reported "/"
>> but if I execute it on node0 and node1 I am seeing:
>>
>> x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
>> Brick gfs-vm000:/gluster/brick/brick0
>> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>> / - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 2
>>
>> Brick gfs-vm001:/gluster/brick/brick0
>> / - Is in split-brain
>>
>> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>> Status: Connected
>> Number of entries: 2
>>
>> Brick gfs-vm002:/gluster/brick/brick0
>> / - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 1
>>
>>
>> I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes
>> and I am seeing node2 has slightly different attr:
>> node0:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.gv0-client-2=0x000000000000000100000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> node1:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.gv0-client-2=0x000000000000000100000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> node2:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gv0-client-0=0x000000000000000200000000
>> trusted.afr.gv0-client-1=0x000000000000000200000000
>> trusted.afr.gv0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> Where do I go from here? Thanks
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/425f1920/attachment.html>
More information about the Gluster-users
mailing list