[Gluster-users] self heal errors on 3.1.1 clients

Burnash, James jburnash at knight.com
Wed Jan 26 20:10:45 UTC 2011


This seems curious - the values are inverted for each of the two mirrors - or perhaps it is because of replication:

Fs17:/export/read-only/g01# getfattr -d -e hex -m trusted.afr /export/read-only/g03
getfattr: Removing leading '/' from absolute path names
# file: export/read-only/g03
trusted.afr.pfs-ro1-client-4=0x000000000000000000000000
trusted.afr.pfs-ro1-client-5=0x000000000000000100000000

  
fs18:/var/tmp/hptools# getfattr -d -e hex -m trusted.afr /export/read-only/g03
getfattr: Removing leading '/' from absolute path names
# file: export/read-only/g03
trusted.afr.pfs-ro1-client-4=0x000000000000000100000000
trusted.afr.pfs-ro1-client-5=0x000000000000000000000000

James Burnash, Unix Engineering

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Wednesday, January 26, 2011 1:38 PM
To: 'David Lloyd'; gluster-users at gluster.org
Subject: Re: [Gluster-users] self heal errors on 3.1.1 clients

Hi David.

Here's an example of the getfattr from my server:

fs17:/var/tmp/hptools# getfattr -d -e hex -m trusted.afr /export/read-only/g01
getfattr: Removing leading '/' from absolute path names
# file: export/read-only/g01
trusted.afr.pfs-ro1-client-0=0x000000000000000000000000
trusted.afr.pfs-ro1-client-1=0x000000000000000100000000

The hex value is the same for all 10 of my directories.

James Burnash, Unix Engineering

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of David Lloyd
Sent: Wednesday, January 26, 2011 12:24 PM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] self heal errors on 3.1.1 clients

I read on another thread about checking the getfattr output for each brick, but it tailed off before any explanation of what to do with this information

We have 8 bricks in the volume. Config is:

g1:~ # gluster volume info glustervol1

Volume Name: glustervol1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: g1:/mnt/glus1
Brick2: g2:/mnt/glus1
Brick3: g3:/mnt/glus1
Brick4: g4:/mnt/glus1
Brick5: g1:/mnt/glus2
Brick6: g2:/mnt/glus2
Brick7: g3:/mnt/glus2
Brick8: g4:/mnt/glus2
Options Reconfigured:
performance.write-behind-window-size: 100mb
performance.cache-size: 512mb
performance.stat-prefetch: on


and the getfattr outputs are:

g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-0=0x000000000000000000000000
trusted.afr.glustervol1-client-1=0x000000000000000000000000

g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-4=0x000000000000000000000000
trusted.afr.glustervol1-client-5=0x000000000000000000000000

g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-0=0x000000000000000000000000
trusted.afr.glustervol1-client-1=0x000000000000000000000000

g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-4=0x000000000000000000000000
trusted.afr.glustervol1-client-5=0x000000000000000000000000

g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-2=0x000000000000000000000000
trusted.afr.glustervol1-client-3=0x000000000000000100000000

g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-6=0x000000000000000000000000
trusted.afr.glustervol1-client-7=0x000000000000000000000000

g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
getfattr: Removing leading '/' from absolute path names # file: mnt/glus1 trusted.afr.glustervol1-client-2=0x000000000000000100000000
trusted.afr.glustervol1-client-3=0x000000000000000000000000

g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
getfattr: Removing leading '/' from absolute path names # file: mnt/glus2 trusted.afr.glustervol1-client-6=0x000000000000000000000000
trusted.afr.glustervol1-client-7=0x000000000000000000000000


Hope someone can help. Things still seem to be working, but slowed down.

Cheers
David


On 26 January 2011 17:07, David Lloyd <david.lloyd at v-consultants.co.uk>wrote:

> We started getting the same problem at almost exactly the same time.
>
> get one of these messages every time I access the root of the mounted
> volume (and nowhere else, I think).
> This is also 3.1.1
>
> I'm just starting to look in to it, I'll let you know if I get anywhere.
>
> David
>
> On 26 January 2011 16:38, Burnash, James <jburnash at knight.com> wrote:
>
>> These errors are appearing in the file
>> /var/log/glusterfs/<mountpoint>.log
>>
>> [2011-01-26 11:02:10.342349] I [afr-common.c:672:afr_lookup_done]
>> pfs-ro1-replicate-5: split brain detected during lookup of /.
>> [2011-01-26 11:02:10.342366] I [afr-common.c:716:afr_lookup_done]
>> pfs-ro1-replicate-5: background  meta-data data self-heal triggered.
>> path: /
>> [2011-01-26 11:02:10.342502] E
>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] pfs-ro1-replicate-2:
>> Unable to self-heal permissions/ownership of '/' (possible split-brain).
>> Please fix the file on all backend volumes
>>
>> Apparently the issue is the root of the storage pool, which in my
>> case on the backend storage servers is this path:
>>
>> /export/read-only - permissions are:            drwxr-xr-x 12 root root
>> 4096 Dec 28 12:09 /export/read-only/
>>
>> Installation is GlusterFS 3.1.1 on servers and clients, servers
>> running CentOS 5.5, clients running CentOS 5.2.
>>
>> The volume info header is below:
>>
>> Volume Name: pfs-ro1
>> Type: Distributed-Replicate
>> Status: Started
>> Number of Bricks: 10 x 2 = 20
>> Transport-type: tcp
>>
>> Any ideas? I don't see a permission issue on the directory or it's
>> subs themselves.
>>
>> James Burnash, Unix Engineering
>>
>>
>>


DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list