[Gluster-devel] afr logic

Kevan Benson kbenson at a-1networks.com
Wed Oct 17 16:38:25 UTC 2007


Alexey Filin wrote:
> Hi Kevan,
>
> consistency of afr'ed files is important question as of failures in 
> backend fs too, afr is a medicine against node failures not backend fs 
> ones (at least not directly), in the last case files can be changed 
> "legally" in bypass glusterfs by fsck after a hw/sw failure and the 
> changes have to be handled for corrupted replica, else reading of the 
> same file can give different data (especialy for forthcoming load 
> balanced read of replicas). Fortunately rsync'ing of original must 
> create consistent replica in the case too (if cluster/stripe under afr 
> works equally with replicas), unfortunately extended attributes aren't 
> rsync'ed (I tested it) what can be required during repairing.
>
> It seems glusterfs could try to handle hw/sw failures in backend fs 
> with checksums in extended attributes and checksums are to be 
> calculated for file chunks (because one checksum requires full 
> recalculation after appending/changing of one byte to/in a gigabyte 
> file) in the case glusterfs has to recalculate checksums of all files 
> on corrupted fs (may be toooo long, it is the same case with 
> rsync'ing) or get list of corrupted files from backend fs in some way 
> (e.g. with a flag set by fsck in extended attributes). May be some 
> kind of distributed raid is a better solution, first step in the 
> direction was done already by cluster/stripe (unfortunately one of 
> implementations, DDRaid http://sources.redhat.com/cluster/ddraid/ by 
> Daniel Phillips seems to be suspended), perhaps it is too 
> computational/network intensive and raid under backend fs is the best 
> solution even taking into account disk space overhead.
>
> I'm very interested to hear thoughts about it from glusterfs 
> developers to clear my misunderstanding.

The rsync case can probably be handled through a separate find of the 
appropriate attributes on the source and set on the target.  A simple 
bash/perl script could handle this in a few lines.

The fsck case is more interesting, but if you could get fsck to report 
file/directory names that have problems and not fix them, it's easy to 
pipe that to a script to remove the trusted.afr.version attribute on the 
files and then the AFR will heal itself.

Checksums would of course give you much better tracking of corrupted 
files, but I imagine the cpu strain and speed decrease would make it 
non-feasible.

-- 

-Kevan Benson
-A-1 Networks





More information about the Gluster-devel mailing list