[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error

Ravishankar N ravishankar at redhat.com
Wed Jul 4 16:01:41 UTC 2018



On 07/04/2018 09:20 PM, Anh Vo wrote:
> I forgot to mention we're using 3.12.10
>
> On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo <vtqanh at gmail.com 
> <mailto:vtqanh at gmail.com>> wrote:
>
>     If I run "sudo gluster volume heal gv0 split-brain latest-mtime /"
>     I get the following:
>
>     Lookup failed on /:Invalid argument.
>     Volume heal failed.
>

Can you share the glfsheal-<volname>.log on the node where you ran this 
failed command?
>
>
>     node2 was not connected at that time, because if we connect it to
>     the system after a few minutes gluster will become almost unusable
>     and we have many jobs failing. This morning I reconnected it and
>     ran heal info and we have about 30000 entries to heal (15K from
>     gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20% have file
>     names). It's not feasible for us to check the individual gfid so
>     we kinda rely on gluster self heal to handle those gfid. The "/"
>     is a concern because it prevents us from mounting nfs. We do need
>     to mount nfs for some of our management because gluster fuse mount
>     is much slower compared to nfs when it comes to recursive
>     operations like 'du'
>
>     Do you have any suggestion for healing the metadata on '/' ?
>
You can manually delete the afr xattrs on node 3 as a workaround:
setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0

This should remove the split-brain on root.

HTH,
Ravi
>
>
>     Thanks
>     Anh
>
>     On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N
>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>
>         Hi,
>
>         What version of gluster are you using?
>
>         1. The afr xattrs on '/' indicate a meta-data split-brain. You
>         can resolve it using one of the policies listed in
>         https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>         <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/>
>
>         For example, "|gluster volume heal gv0 split-brain
>         latest-mtime / "
>         |
>
>         2. Is the file corresponding to the other gfid
>         (81289110-867b-42ff-ba3b-1373a187032b) present in all bricks?
>         What do the getfattr outputs for this file indicate?
>
>         3. As for the discrepancy in output of heal info, is node2
>         connected to the other nodes? Does heal info still print the
>         details of all 3 bricks when you run it on node2 ?
>         -Ravi
>
>
>         On 07/04/2018 01:47 AM, Anh Vo wrote:
>>         Actually we just discovered that the heal info command was
>>         returning different things when executed on the different
>>         nodes of our 3-replica setup.
>>         When we execute it on node2 we did not see the split brain
>>         reported "/" but if I execute it on node0 and node1 I am seeing:
>>
>>         x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
>>         Brick gfs-vm000:/gluster/brick/brick0
>>         <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>>         / - Is in split-brain
>>
>>         Status: Connected
>>         Number of entries: 2
>>
>>         Brick gfs-vm001:/gluster/brick/brick0
>>         / - Is in split-brain
>>
>>         <gfid:81289110-867b-42ff-ba3b-1373a187032b>
>>         Status: Connected
>>         Number of entries: 2
>>
>>         Brick gfs-vm002:/gluster/brick/brick0
>>         / - Is in split-brain
>>
>>         Status: Connected
>>         Number of entries: 1
>>
>>
>>         I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all
>>         three nodes and I am seeing node2 has slightly different attr:
>>         node0:
>>         sudo getfattr -d -m . -e hex /gluster/brick/brick0
>>         getfattr: Removing leading '/' from absolute path names
>>         # file: gluster/brick/brick0
>>         trusted.afr.gv0-client-2=0x000000000000000100000000
>>         trusted.gfid=0x00000000000000000000000000000001
>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>         trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>>         node1:
>>         sudo getfattr -d -m . -e hex /gluster/brick/brick0
>>         getfattr: Removing leading '/' from absolute path names
>>         # file: gluster/brick/brick0
>>         trusted.afr.gv0-client-2=0x000000000000000100000000
>>         trusted.gfid=0x00000000000000000000000000000001
>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>         trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>>         node2:
>>         sudo getfattr -d -m . -e hex /gluster/brick/brick0
>>         getfattr: Removing leading '/' from absolute path names
>>         # file: gluster/brick/brick0
>>         trusted.afr.dirty=0x000000000000000000000000
>>         trusted.afr.gv0-client-0=0x000000000000000200000000
>>         trusted.afr.gv0-client-1=0x000000000000000200000000
>>         trusted.afr.gv0-client-2=0x000000000000000000000000
>>         trusted.gfid=0x00000000000000000000000000000001
>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>         trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2
>>
>>         Where do I go from here? Thanks
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/a4f3e78f/attachment.html>


More information about the Gluster-users mailing list