[Gluster-users] AFR self-heal problem

artur.k a.kaminski at o2.pl
Wed Jan 7 10:49:55 UTC 2009


We have a problem with glusterfs. We are using two serwers and couple clients (AFR). There are errors on some files on the clients: 

on-client:/var/www#  cat blogclient8x/production/files/skins/img/view1451.jpg | head -1
cat: blogclient8x/production/files/skins/img/view1451.jpg: Input/output error

bad !!!

on-server:/var/storage/glusterfs# cat blogclient8x/production/files/skins/img/view1451.jpg | head -1
˙Ř˙ŕJFIF˙ţ>CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), default quality 

ok !!!

In the client's glusterfs log we have:

2009-01-07 11:16:36 W [afr-self-heal-common.c:1005:afr_self_heal] afr: performing self heal on /blogclient8x/production/files/
skins/img/view1451.jpg (metadata=0 data=1 entry=0)
2009-01-07 11:16:36 E [afr-self-heal-data.c:777:afr_sh_data_fix] afr: Unable to resolve conflicting data of /blogclient8x/prod
uction/files/skins/img/view1451.jpg. Please resolve manually by deleting the file /blogclient8x/production/files/skins/img/vie
w1451.jpg from all but the preferred subvolume
2009-01-07 11:16:36 W [afr-self-heal-data.c:70:afr_sh_data_done] afr: self heal of /blogclient8x/production/files/skins/img/vi
ew1451.jpg completed
2009-01-07 11:16:36 W [afr.c:595:afr_open] afr: returning EIO, file has to be manually corrected in backend
2009-01-07 11:16:36 E [fuse-bridge.c:662:fuse_fd_cbk] glusterfs-fuse: 597734: OPEN() /blogclient8x/production/files/skins/img/
view1451.jpg => -1 (Input/output error)
2009-01-07 11:16:36 W [afr.c:595:afr_open] afr: returning EIO, file has to be manually corrected in backend
2009-01-07 11:16:36 E [fuse-bridge.c:662:fuse_fd_cbk] glusterfs-fuse: 597735: OPEN() /blogclient8x/production/files/skins/img/
view1451.jpg => -1 (Input/output error)
2009-01-07 11:16:37 W [afr.c:595:afr_open] afr: returning EIO, file has to be manually corrected in backend
2009-01-07 11:16:37 E [fuse-bridge.c:662:fuse_fd_cbk] glusterfs-fuse: 597736: OPEN() /blogclient8x/production/files/skins/img/
view1451.jpg => -1 (Input/output error) 

Removing the file from one of the glusterFS servers doesn't help. Even if I disable one of the servers and try to use cat command on the file (on the client) the problem still persists with the same error message in the log file. 



glusterfs 1.4.0rc3 built on Dec 17 2008 15:34:25
Repository revision: glusterfs--mainline--3.0--patch-777

Linux www 2.6.18-6-xen-amd64
Debian etch 4.0

client:

volume client1
  type protocol/client
  option transport-type tcp/client
  option remote-host xxx
  option remote-port 6996
  option remote-subvolume brick
end-volume

volume client2
 type protocol/client
 option transport-type tcp/client
 option remote-host xxx
 option remote-port 6996
 option remote-subvolume brick
end-volume

volume afr
  type cluster/afr
  subvolumes client1 client2
  option entry-self-heal on
  option data-self-heal on
  option metadata-self-heal off
end-volume

volume wh
  type performance/write-behind
  option flush-behind on
  subvolumes afr
end-volume

volume io-cache
  type performance/io-cache
  option cache-size 64MB
  option page-size 1MB
  option force-revalidate-timeout 2
  subvolumes wh
end-volume

volume iot
  type performance/io-threads
  subvolumes io-cache
  option thread-count 4
  option cache-size 64MB
end-volume


server:
volume posix
  type storage/posix
  option directory /var/storage/glusterfs
end-volume

volume p-locks
  type features/posix-locks
  subvolumes posix
  option mandatory on
end-volume

volume wh
  type performance/write-behind
  option flush-behind on
  subvolumes p-locks
end-volume

volume brick
  type performance/io-threads
  subvolumes wh
  option thread-count 2
  option cache-size 64MB
end-volume

volume server
  type protocol/server
  subvolumes brick
  option transport-type tcp/server
  option auth.addr.brick.allow 10.*.*.*
end-volume







More information about the Gluster-users mailing list