[Gluster-users] self heal errors on 3.1.1 clients

David Lloyd david.lloyd at v-consultants.co.uk
Thu Jan 27 00:25:09 UTC 2011


Well, I did this and it seems to have worked. I was just guessing really,
didn't have any documentation or advice from anyone in the know.

I just reset the attributes on the root directory for each brick that was
not all zeroes.

I found it easier to dump the attributes without the '-e hex'

g4:~ # getfattr -d  -m trusted.afr /mnt/glus1 /mnt/glus2
getfattr: Removing leading '/' from absolute path names
# file: mnt/glus1
trusted.afr.glustervol1-client-2=0sAAAAAAAAAAEAAAAA
trusted.afr.glustervol1-client-3=0sAAAAAAAAAAAAAAAA

Then
setfattr -n trusted.afr.glustervol1-client-2 -v 0sAAAAAAAAAAAAAAAA
/mnt/glus1

I did that on all the bricks that didn't have all A's

next time i stat-ed the root of the filesystem on the client the self heal
worked ok.

I'm not comfortable advising you to do this as I'm really feeling my way
here, but it looks as though it worked for me.

David



On 26 January 2011 20:10, Burnash, James <jburnash at knight.com> wrote:

> This seems curious - the values are inverted for each of the two mirrors -
> or perhaps it is because of replication:
>
> Fs17:/export/read-only/g01# getfattr -d -e hex -m trusted.afr
> /export/read-only/g03
> getfattr: Removing leading '/' from absolute path names
> # file: export/read-only/g03
> trusted.afr.pfs-ro1-client-4=0x000000000000000000000000
> trusted.afr.pfs-ro1-client-5=0x000000000000000100000000
>
>
> fs18:/var/tmp/hptools# getfattr -d -e hex -m trusted.afr
> /export/read-only/g03
> getfattr: Removing leading '/' from absolute path names
> # file: export/read-only/g03
> trusted.afr.pfs-ro1-client-4=0x000000000000000100000000
> trusted.afr.pfs-ro1-client-5=0x000000000000000000000000
>
> James Burnash, Unix Engineering
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:
> gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
> Sent: Wednesday, January 26, 2011 1:38 PM
> To: 'David Lloyd'; gluster-users at gluster.org
> Subject: Re: [Gluster-users] self heal errors on 3.1.1 clients
>
> Hi David.
>
> Here's an example of the getfattr from my server:
>
> fs17:/var/tmp/hptools# getfattr -d -e hex -m trusted.afr
> /export/read-only/g01
> getfattr: Removing leading '/' from absolute path names
> # file: export/read-only/g01
> trusted.afr.pfs-ro1-client-0=0x000000000000000000000000
> trusted.afr.pfs-ro1-client-1=0x000000000000000100000000
>
> The hex value is the same for all 10 of my directories.
>
> James Burnash, Unix Engineering
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:
> gluster-users-bounces at gluster.org] On Behalf Of David Lloyd
> Sent: Wednesday, January 26, 2011 12:24 PM
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] self heal errors on 3.1.1 clients
>
> I read on another thread about checking the getfattr output for each brick,
> but it tailed off before any explanation of what to do with this information
>
> We have 8 bricks in the volume. Config is:
>
> g1:~ # gluster volume info glustervol1
>
> Volume Name: glustervol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: g1:/mnt/glus1
> Brick2: g2:/mnt/glus1
> Brick3: g3:/mnt/glus1
> Brick4: g4:/mnt/glus1
> Brick5: g1:/mnt/glus2
> Brick6: g2:/mnt/glus2
> Brick7: g3:/mnt/glus2
> Brick8: g4:/mnt/glus2
> Options Reconfigured:
> performance.write-behind-window-size: 100mb
> performance.cache-size: 512mb
> performance.stat-prefetch: on
>
>
> and the getfattr outputs are:
>
> g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus1
> trusted.afr.glustervol1-client-0=0x000000000000000000000000
> trusted.afr.glustervol1-client-1=0x000000000000000000000000
>
> g1:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus2
> trusted.afr.glustervol1-client-4=0x000000000000000000000000
> trusted.afr.glustervol1-client-5=0x000000000000000000000000
>
> g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus1
> trusted.afr.glustervol1-client-0=0x000000000000000000000000
> trusted.afr.glustervol1-client-1=0x000000000000000000000000
>
> g2:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus2
> trusted.afr.glustervol1-client-4=0x000000000000000000000000
> trusted.afr.glustervol1-client-5=0x000000000000000000000000
>
> g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus1
> trusted.afr.glustervol1-client-2=0x000000000000000000000000
> trusted.afr.glustervol1-client-3=0x000000000000000100000000
>
> g3:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus2
> trusted.afr.glustervol1-client-6=0x000000000000000000000000
> trusted.afr.glustervol1-client-7=0x000000000000000000000000
>
> g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus1
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus1
> trusted.afr.glustervol1-client-2=0x000000000000000100000000
> trusted.afr.glustervol1-client-3=0x000000000000000000000000
>
> g4:~ # getfattr -d -e hex -m trusted.afr /mnt/glus2
> getfattr: Removing leading '/' from absolute path names # file: mnt/glus2
> trusted.afr.glustervol1-client-6=0x000000000000000000000000
> trusted.afr.glustervol1-client-7=0x000000000000000000000000
>
>
> Hope someone can help. Things still seem to be working, but slowed down.
>
> Cheers
> David
>
>
> On 26 January 2011 17:07, David Lloyd <david.lloyd at v-consultants.co.uk
> >wrote:
>
> > We started getting the same problem at almost exactly the same time.
> >
> > get one of these messages every time I access the root of the mounted
> > volume (and nowhere else, I think).
> > This is also 3.1.1
> >
> > I'm just starting to look in to it, I'll let you know if I get anywhere.
> >
> > David
> >
> > On 26 January 2011 16:38, Burnash, James <jburnash at knight.com> wrote:
> >
> >> These errors are appearing in the file
> >> /var/log/glusterfs/<mountpoint>.log
> >>
> >> [2011-01-26 11:02:10.342349] I [afr-common.c:672:afr_lookup_done]
> >> pfs-ro1-replicate-5: split brain detected during lookup of /.
> >> [2011-01-26 11:02:10.342366] I [afr-common.c:716:afr_lookup_done]
> >> pfs-ro1-replicate-5: background  meta-data data self-heal triggered.
> >> path: /
> >> [2011-01-26 11:02:10.342502] E
> >> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] pfs-ro1-replicate-2:
> >> Unable to self-heal permissions/ownership of '/' (possible split-brain).
> >> Please fix the file on all backend volumes
> >>
> >> Apparently the issue is the root of the storage pool, which in my
> >> case on the backend storage servers is this path:
> >>
> >> /export/read-only - permissions are:            drwxr-xr-x 12 root root
> >> 4096 Dec 28 12:09 /export/read-only/
> >>
> >> Installation is GlusterFS 3.1.1 on servers and clients, servers
> >> running CentOS 5.5, clients running CentOS 5.2.
> >>
> >> The volume info header is below:
> >>
> >> Volume Name: pfs-ro1
> >> Type: Distributed-Replicate
> >> Status: Started
> >> Number of Bricks: 10 x 2 = 20
> >> Transport-type: tcp
> >>
> >> Any ideas? I don't see a permission issue on the directory or it's
> >> subs themselves.
> >>
> >> James Burnash, Unix Engineering
> >>
> >>
> >>
>
>
> DISCLAIMER:
> This e-mail, and any attachments thereto, is intended only for use by the
> addressee(s) named herein and may contain legally privileged and/or
> confidential information. If you are not the intended recipient of this
> e-mail, you are hereby notified that any dissemination, distribution or
> copying of this e-mail, and any attachments thereto, is strictly prohibited.
> If you have received this in error, please immediately notify me and
> permanently delete the original and any copy of any e-mail and any printout
> thereof. E-mail transmission cannot be guaranteed to be secure or
> error-free. The sender therefore does not accept liability for any errors or
> omissions in the contents of this message which arise as a result of e-mail
> transmission.
> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at
> its discretion, monitor and review the content of all e-mail communications.
> http://www.knight.com
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>


More information about the Gluster-users mailing list