[Gluster-users] glusterfs3.4.2-1 split-brain question

Khoi Mai KHOIMAI at UP.COM
Sat Sep 26 14:56:33 UTC 2015


I was checking on the client fuse mount and saw that


[2015-09-26 14:41:29.417267] E 
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 
0-devstatic-replicate-1: Unable to self-heal contents of '/' (possible 
split-brain). Please delete the file from all but the preferred 
subvolume.- Pending matrix:  [ [ 0 15 ] [ 1 0 ] ]
[2015-09-26 14:41:29.418063] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-devstatic-replicate-1:  metadata self heal  failed,   on /

After reviewing your doc to build understanding, 
https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md
 I found 
http://thr3ads.net/gluster-users/2013/11/2710016-Unable-to-self-heal-contents-of-gfid-00000000-0000-0000-0000-000000000001
also had the same issue of '/'. 

Now I believe all is clear. 

Client log:
[2015-09-26 14:53:35.662325] I 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-devstatic-replicate-1:  metadata self heal  is successfully completed, 
metadata self heal from source devstatic-client-2 to devstatic-client-3, 
metadata - Pending matrix:  [ [ 0 0 ] [ 0 0 ] ], on /
[2015-09-26 14:53:35.667537] I 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-devstatic-replicate-0:  metadata self heal  is successfully completed, 
metadata self heal from source devstatic-client-0 to devstatic-client-1, 
metadata - Pending matrix:  [ [ 0 0 ] [ 0 0 ] ], on /

gluster storage cli output.
Now the gluster volume heal devstatic info split-brain is clean
[root at omdx1b51 ~]#  gluster volume heal devstatic info split-brain
Gathering list of split brain entries on volume devstatic has been 
successful

Brick omhq1b4e:/static/content
Number of entries: 0

Brick omdx1b50:/static/content
Number of entries: 0

Brick omhq1b4f:/static/content
Number of entries: 0

Brick omdx1b51:/static/content
Number of entries: 0

Please let me know if there were any steps I've missed or additional areas 
to look at.

Thank you,

Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Senior Project Engineer





From:   Ravishankar N <ravishankar at redhat.com>
To:     Khoi Mai <KHOIMAI at up.com>
Date:   09/26/2015 12:59 AM
Subject:        Re: [Gluster-users] glusterfs3.4.2-1 split-brain question



This email originated from outside of the company. Please use discretion 
if opening attachments or clicking on links. 


On 09/26/2015 10:37 AM, Khoi Mai wrote:
I'd like to run the afr attr reset on omdx1b51, does that make omhq1b4f 
the winning source?
Yes. Resetting trusted.afr.devstatic-client-2 on omdx1b51 makes omhq1b4f 
the source because it blames omdx1b51 via trusted.afr.devstatic-client-3. 
 Or do I run the commands on the server I want to be the source?  For 
example below?

So, the intended changes are:
On omdx1b51
For trusted.afr.devstatic-client-2
0x000000000000000600000000 to 0x000000000000000000000000
Hence execute setfattr -n trusted.afr.vol-client-0 -v 
0x000000000000000100000000 /static/content/
then
gluster volume heal devstatic 

Thank you for your help!

Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Senior Project Engineer





From:        Ravishankar N <ravishankar at redhat.com>
To:        Khoi Mai <KHOIMAI at up.com>
Cc:        gluster-users at gluster.org
Date:        09/25/2015 09:04 PM
Subject:        Re: [Gluster-users] glusterfs3.4.2-1 split-brain question



This email originated from outside of the company. Please use discretion 
if opening attachments or clicking on links.   


On 09/25/2015 07:40 PM, Khoi Mai wrote:
I think I found it from your github doc.  the quota size does not match 
with the replicate pair.  I don't  know if that would make the difference. 
  I apologize, i cannot use fpaste.org, or pastebin.com due to policies at 
my company.

I'm not sure quota xattrs are handled in AFR in glusterfs-3.4.  There 
doesn't seem to be any split-brain in the first replica pair since the afr 
xattrs all seem to be zero. For the second replica pair, they are in 
metadata split-brain (but unlikely due to the quota-size xattr).  You can 
pick one brick as source reset the appropriate afr xattr and run `gluster 
v heal volname` once. 


[root at omhq1b4e ~]# getfattr -d -m . -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.afr.devstatic-client-0=0x000000000000000000000000
trusted.afr.devstatic-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff
trusted.glusterfs.quota.size=0x0000006f303e4e00
trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc

[root at omdx1b50 ~]# getfattr -d -m . -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.afr.devstatic-client-0=0x000000000000000000000000
trusted.afr.devstatic-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff
trusted.glusterfs.quota.size=0x00000081bfca4e00
trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc


[root at omhq1b4f ~]# getfattr -d -m . -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.afr.devstatic-client-2=0x000000000000000000000000
trusted.afr.devstatic-client-3=0x000000000000000900000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff
trusted.glusterfs.quota.size=0x00000076b9b20800
trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc

[root at omdx1b51 ~]# getfattr -d -m . -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.afr.devstatic-client-2=0x000000000000000600000000
trusted.afr.devstatic-client-3=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.limit-set=0x0000018000000000ffffffffffffffff
trusted.glusterfs.quota.size=0x0000006eb4e0b000
trusted.glusterfs.volume-id=0x75832afbf20e40188d748550a92233fc





Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Senior Project Engineer





From:        Khoi Mai/UPC
To:        Ravishankar N <ravishankar at redhat.com>
Cc:        gluster-users at gluster.org
Date:        09/25/2015 09:01 AM
Subject:        Re: [Gluster-users] glusterfs3.4.2-1 split-brain question


the gfid looks the same.  I'm not sure what gluster volume heal info 
split-brain is reporting when the GFID matches, and for all 4 nodes in the 
devstatic volume.

[root at omhq1b4f ~]# getfattr -h -d -m trusted.gfid -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.gfid=0x00000000000000000000000000000001

[root at omhq1b4f ~]# stat /static/content/
  File: `/static/content/'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd02h/64770d    Inode: 536871040   Links: 90
Access: (0775/drwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-02-02 09:06:27.073528000 -0600
Modify: 2014-12-23 10:13:00.823641000 -0600
Change: 2015-09-25 08:42:44.524336543 -0500
[root at omhq1b4f ~]#

[root at omdx1b51 ~]# getfattr -h -d -m trusted.gfid -e hex /static/content/
getfattr: Removing leading '/' from absolute path names
# file: static/content/
trusted.gfid=0x00000000000000000000000000000001


[root at omdx1b51 ~]# stat /static/content/
  File: `/static/content/'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd02h/64770d    Inode: 536871040   Links: 90
Access: (0775/drwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-02-02 09:06:27.073528000 -0600
Modify: 2014-12-23 10:13:00.823641000 -0600
Change: 2015-09-25 08:42:44.526287950 -0500





Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Senior Project Engineer






From:        Ravishankar N <ravishankar at redhat.com>
To:        Khoi Mai <KHOIMAI at UP.COM>, gluster-users at gluster.org
Date:        09/25/2015 03:13 AM
Subject:        Re: [Gluster-users] glusterfs3.4.2-1 split-brain question



This email originated from outside of the company. Please use discretion 
if opening attachments or clicking on links.   


On 09/25/2015 07:48 AM, Khoi Mai wrote:
I have a 4 node distributed-replicated gluster farm.

Volume Name: devstatic
Type: Distributed-Replicate
Volume ID: 75832afb-f20e-4017-8d74-8550a92233fd
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: omhq1b4e:/static/content
Brick2: omdx1b50:/static/content
Brick3: omhq1b4f:/static/content
Brick4: omdx1b51:/static/content
Options Reconfigured:
features.quota-deem-statfs: on
server.allow-insecure: on
network.ping-timeout: 10
performance.lazy-open: off
performance.write-behind: on
features.quota: on
geo-replication.indexing: off
server.statedump-path: /tmp/
diagnostics.brick-log-level: CRITICAL


When I query heal split-brain info I get the following.

[root at omhq1b4e ~]# gluster volume heal devstatic info split-brain
Gathering list of split brain entries on volume devstatic has been 
successful

Brick omhq1b4e:/static/content
Number of entries: 0

Brick omdx1b50:/static/content
Number of entries: 0

Brick omhq1b4f:/static/content
Number of entries: 43
at                    path on brick
-----------------------------------
2015-09-24 18:50:20 /
2015-09-24 18:50:20 /
2015-09-24 18:52:01 /
2015-09-24 19:10:22 /

Brick omdx1b51:/static/content
Number of entries: 42
at                    path on brick
-----------------------------------
2015-09-24 18:51:58 /
2015-09-24 18:51:59 /
2015-09-24 19:01:59 /
2015-09-24 19:11:59 /


Being / on the same replicate, how would I safely resolve this issue?  Is 
it really going to require me to delete the root of each node and heal?  I 
hope not, the entire volume is about 1TB.

No, it is likely that the root is only in metadata split-brain. What does 
the getfattr output of '/' show on the bricks?  
https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md
should tell you how to resolve split-brains.
Thank you,





Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Senior Project Engineer




**



This email and any attachments may contain information that is 
confidential and/or privileged for the sole use of the intended recipient. 
Any use, review, disclosure, copying, distribution or reliance by others, 
and any forwarding of this email or its contents, without the express 
permission of the sender is strictly prohibited by law. If you are not the 
intended recipient, please contact the sender immediately, delete the 
e-mail and destroy all copies.

**


_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users




**



This email and any attachments may contain information that is 
confidential and/or privileged for the sole use of the intended recipient. 
Any use, review, disclosure, copying, distribution or reliance by others, 
and any forwarding of this email or its contents, without the express 
permission of the sender is strictly prohibited by law. If you are not the 
intended recipient, please contact the sender immediately, delete the 
e-mail and destroy all copies.

**



**



This email and any attachments may contain information that is 
confidential and/or privileged for the sole use of the intended recipient. 
Any use, review, disclosure, copying, distribution or reliance by others, 
and any forwarding of this email or its contents, without the express 
permission of the sender is strictly prohibited by law. If you are not the 
intended recipient, please contact the sender immediately, delete the 
e-mail and destroy all copies.

**



**

This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient.  Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law.  If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies.
**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150926/da6a99e5/attachment.html>


More information about the Gluster-users mailing list