[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
Ravishankar N
ravishankar at redhat.com
Wed Jul 4 16:01:41 UTC 2018
On 07/04/2018 09:20 PM, Anh Vo wrote:
> I forgot to mention we're using 3.12.10
>
> On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo <vtqanh at gmail.com
> <mailto:vtqanh at gmail.com>> wrote:
>
> If I run "sudo gluster volume heal gv0 split-brain latest-mtime /"
> I get the following:
>
> Lookup failed on /:Invalid argument.
> Volume heal failed.
>
Can you share the glfsheal-<volname>.log on the node where you ran this
failed command?
>
>
> node2 was not connected at that time, because if we connect it to
> the system after a few minutes gluster will become almost unusable
> and we have many jobs failing. This morning I reconnected it and
> ran heal info and we have about 30000 entries to heal (15K from
> gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20% have file
> names). It's not feasible for us to check the individual gfid so
> we kinda rely on gluster self heal to handle those gfid. The "/"
> is a concern because it prevents us from mounting nfs. We do need
> to mount nfs for some of our management because gluster fuse mount
> is much slower compared to nfs when it comes to recursive
> operations like 'du'
>
> Do you have any suggestion for healing the metadata on '/' ?
>
You can manually delete the afr xattrs on node 3 as a workaround:
setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0
This should remove the split-brain on root.
HTH,
Ravi
>
>
> Thanks
> Anh
>
> On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N
> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>
> Hi,
>
> What version of gluster are you using?
>
> 1. The afr xattrs on '/' indicate a meta-data split-brain. You
> can resolve it using one of the policies listed in
> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
> <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/>
>
> For example, "|gluster volume heal gv0 split-brain
> latest-mtime / "
> |
>
> 2. Is the file corresponding to the other gfid
> (81289110-867b-42ff-ba3b-1373a187032b) present in all bricks?
> What do the getfattr outputs for this file indicate?
>
> 3. As for the discrepancy in output of heal info, is node2
> connected to the other nodes? Does heal info still print the
> details of all 3 bricks when you run it on node2 ?
> -Ravi
>
>
> On 07/04/2018 01:47 AM, Anh Vo wrote:
>> Actually we just discovered that the heal info command was
>> returning different things when executed on the different
>> nodes of our 3-replica setup.
>> When we execute it on node2 we did not see the split brain
>> reported "/" but if I execute it on node0 and node1 I am seeing:
>>
>> x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
>> Brick gfs-vm000:/gluster/brick/brick0
>> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>> / - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 2
>>
>> Brick gfs-vm001:/gluster/brick/brick0
>> / - Is in split-brain
>>
>> <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>> Status: Connected
>> Number of entries: 2
>>
>> Brick gfs-vm002:/gluster/brick/brick0
>> / - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 1
>>
>>
>> I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all
>> three nodes and I am seeing node2 has slightly different attr:
>> node0:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.gv0-client-2=0x000000000000000100000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> node1:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.gv0-client-2=0x000000000000000100000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> node2:
>> sudo getfattr -d -m . -e hex /gluster/brick/brick0
>> getfattr: Removing leading '/' from absolute path names
>> # file: gluster/brick/brick0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gv0-client-0=0x000000000000000200000000
>> trusted.afr.gv0-client-1=0x000000000000000200000000
>> trusted.afr.gv0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>> Where do I go from here? Thanks
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/a4f3e78f/attachment.html>
More information about the Gluster-users
mailing list