[Gluster-users] Brick pair file mismatch, self-heal problems?

Dan Bretherton d.a.bretherton at reading.ac.uk
Sat Aug 6 12:28:01 UTC 2011


> Try this to trigger self heal:
>
> find<gluster-mount>  -noleaf -print0 -name<file name>| xargs --null
> stat>/dev/null
>
>
>
> On Sun, May 15, 2011 at 11:20 AM, Martin Schenker
> <martin.schenker at profitbricks.com  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>>  wrote:
> >/  Can someone enlighten me what's going on here? We have a two peers, the file
> />/  21313 is shown through the client mountpoint as "1Jan1970", attribs on
> />/  server pserver3 don't match but NO self-heal or repair can be triggered
> />/  through "ls -alR"?!?
> />/
> />/  Checking the files through the server mounts show that two versions are on
> />/  the system. But the wrong one (as with the "1Jan1970") seems to be the
> />/  preferred one by the client?!?
> />/
> />/  Do I need to use setattr or what in order to get the client to see the RIGHT
> />/  version?!? This is not the ONLY file displaying this problematic behaviour!
> />/
> />/  Thanks for any feedback.
> />/
> />/  Martin
> />/
> />/  pserver5:
> />/
> />/  0root at pserver5  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # ls -al
> />/  /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> />/  /hdd-images
> />/
> />/  -rwxrwx--- 1 libvirt-qemu vcb  483183820800 May 13 13:41 21313
> />/
> />/  0root at pserver5  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # getfattr -R -d -e hex -m "trusted.afr."
> />/  /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> />/  /hdd-images/21313
> />/  getfattr: Removing leading '/' from absolute path names
> />/  # file:
> />/  mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> />/  hdd-images/21313
> />/  trusted.afr.storage0-client-2=0x000000000000000000000000
> />/  trusted.afr.storage0-client-3=0x000000000000000000000000
> />/
> />/  0root at pserver5  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # ls -alR
> />/  /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> />/  dd-images/21313
> />/  -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
> />/  /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> />/  dd-images/21313
> />/
> />/  pserver3:
> />/
> />/  0root at pserver3  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # ls -al
> />/  /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef
> />/  /hdd-images
> />/
> />/  -rwxrwx--- 1 libvirt-qemu kvm  483183820800 Jan  1  1970 21313
> />/
> />/  0root at pserver3  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # ls -alR
> />/  /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> />/  dd-images/21313
> />/  -rwxrwx--- 1 libvirt-qemu kvm 483183820800 Jan  1  1970
> />/  /opt/profitbricks/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/h
> />/  dd-images/21313
> />/
> />/  0root at pserver3  <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>:~ # getfattr -R -d -e hex -m "trusted.afr."
> />/  /mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-
> />/  ad8f-8542864da6ef/hdd-images/21313
> />/  getfattr: Removing leading '/' from absolute path names
> />/  # file:
> />/  mnt/gluster/brick1/storage/images/2078/ebb83b05-3a83-9d18-ad8f-8542864da6ef/
> />/  hdd-images/21313
> />/  trusted.afr.storage0-client-2=0x000000000000000000000000
> />/  trusted.afr.storage0-client-3=0x0b0000090900000000000000<- mismatch,
> />/  should be targeted for self-heal/repair? Why is there a difference in the
> />/  views?
> />/
> />/
> />/   From the volfile:
> />/
> />/  volume storage0-client-2
> />/      type protocol/client
> />/      option remote-host de-dc1-c1-pserver3
> />/      option remote-subvolume /mnt/gluster/brick1/storage
> />/      option transport-type rdma
> />/      option ping-timeout 5
> />/  end-volume
> />/
> />/  volume storage0-client-3
> />/      type protocol/client
> />/      option remote-host de-dc1-c1-pserver5
> />/      option remote-subvolume /mnt/gluster/brick1/storage
> />/      option transport-type rdma
> />/      option ping-timeout 5
> />/  end-volume
> />/
> /
Hello All-
I am seeing similar behaviour in two of my volumes, now using GlusterFS 
version 3.2.2.  There are files dated 1st Jan 1970 on one brick, where 
the same files on the mirror brick have sensible date stamps.  In the 
cases I have investigated the date shown at the mount point is 1st Jan 
1970.  However, unlike the problem initially reported in this thread, I 
have not seen any xattr mismatches, as illustrated by the example below.

[root at bdan4 glusterfs]# ls -l 
behemoth/aatsr/AT2_AVG_3PAARC19951229_D_nD2b.nc
-rw-r--r-- 1 resc essc 381894 Jan  1  1970 
behemoth/aatsr/AT2_AVG_3PAARC19951229_D_nD2b.nc
[root at bdan4 glusterfs]# getfattr -R -d -e hex -m "trusted.afr." 
behemoth/aatsr/AT2_AVG_3PAARC19951229_D_nD2b.nc
# file: behemoth/aatsr/AT2_AVG_3PAARC19951229_D_nD2b.nc
trusted.afr.marine-client-2=0x000000000000000000000000
trusted.afr.marine-client-3=0x000000000000000000000000

I have been using the following self heal method since it became the 
recommended method shown in the GlusterFS documentation.

find<gluster-mount>  -noleaf -print0 -name<file name>| xargs --null
stat>/dev/null

Is there a better way to trigger self-healing, which would catch these 
obvious modification time errors?

-Dan.

-- 
Mr. D.A. Bretherton
Computer System Manager
Environmental Systems Science Centre
Harry Pitt Building
3 Earley Gate
University of Reading
Reading, RG6 6AL
UK

Tel. +44 118 378 5205
Fax: +44 118 378 6413

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110806/aa17d113/attachment.html>


More information about the Gluster-users mailing list