[Gluster-users] self-heal failed

Tue Jan 8 20:02:51 UTC 2013

Liang,

I don't claim to know the answer to your question, and my knowledge of zfs
is minimal at best so I may be way off base here, but it seems to me that
your attempted random corruption with this command:

   dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480

is likely going to corrupt the underlying zfs filesystem metadata, not
just file data, and I wouldn't expect gluster to be able to fixed a
brick's corrupted filesystem.  Perhaps you now have to take the brick
offline, fix any zfs filesystem errors if possible, bring the brick back
online and see what then happens with self-heal.

--
Todd Pfaff <pfaff at mcmaster.ca>
http://www.rhpcs.mcmaster.ca/

On Tue, 8 Jan 2013, Liang Ma wrote:

> Hi There,
> 
> I'd like to test and understand the self heal feature of glusterfs. This is
> what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.
> 
> gluster volume create gtest replica 2 gluster3:/zfs-test gluster4:/zfs-test
> where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.
> 
> To simulate a random corruption on node gluster3
> 
> dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
> 
> Now zfs detected the corrupted files
>
>   pool: zfs-test
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://zfsonlinux.org/msg/ZFS-8000-8A
>  scan: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zfs-test   ONLINE       0     0 2.29K
>           sda6     ONLINE       0     0 4.59K
> 
> errors: Permanent errors have been detected in the following files:
>
>         /zfs-test/<xattrdir>/trusted.gfid
>         /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46
>         /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat
> trdir>/trusted.gfid
>         /zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat
> trdir>/trusted.gfid
> 
> Now the gluster log file shows the self heal can't fix the corruption
> [2013-01-08 12:46:03.371214] W
> [afr-common.c:1196:afr_detect_self_heal_by_iatt] 2-gtest-replicate-0:
> /K.iso: gfid different on subvolume
> [2013-01-08 12:46:03.373539] E
> [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk] 2-gtest-replicate-0:
> Missing Gfids for /K.iso
> [2013-01-08 12:46:03.385701] E
> [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
> 2-gtest-replicate-0: background  gfid self-heal failed on /K.iso
> [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_cbk]
> 0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)
> 
> where K.iso is one of the sample files affected by the dd command.
> 
> So could anyone tell me what is the best way to repair the simulated
> corruption?
> 
> Thank you.
> 
> Liang
> 
>