[Gluster-devel] bug in self healing in lastest git 2.0.0pre33 and RC7

ender ender at enderzone.com
Tue Mar 31 17:22:04 UTC 2009

Yes self heal seems to be broken. It seems to trust the first child in the list of subvolumes reguardless of its state. Meaning if you have AFR with 3 children, node1 node2 node3. If node3 or node2 blows up, failed harddrives whatever. (maybe just pulled for security updates) When node2 or node3 are added back everything is fine. If you have to pull node1 for any reason, when its added back everything breaks.

I created the following screen capture to show this more clearly. 

"killall glusterfsd and rm -rf /tank" = harddrive failure (very common and the point of AFR.

If AFR does not protect against hardware failure, what is it for?


nicolas prochazka wrote:
> hello,
> I find a bug in seal healing  :
> Two AFR servers act also as client
> gluster mount point /mnt/vdisk
> gluster backend point /mnt/disk
> 1 - touch /mnt/vdisk/TEST1    :  ok on two server
> 2a - rm /mnt/disk/TEST1  on first server define on AFR translator
>       -> ls -l  /mnt/vdisk  send empty for all server : ok
> 2b - ( not 2a) : rm /mnt/disk/TEST1  on second server define on AFR translator
>       -> ls -l  /mnt/vdisk  send TEST1  for all server :  not OK
> This is first bug , i think the problem comes that Load balancing not
> working, command are always execute on same server, the first define
> this problem is also coming with read-subvolumes not works.
> 3a -  ( second server is define as favorite child )  , no synchronise,
> TEST1 never create ( normal that's always doing operation from server
> 1 ) .
>    Now i write some data on /mnt/disk/TEST1 from second server ) then
>  I touch /mnt/vdisk/TEST1 again => TEST1 synchronize on two server
> with server 2 content : ok
> In my point of views, ls /mnt/vdisk must not always get data from the
> same server , isnt'it ?
> I can correct this problem by do a touch on /mnt/vdisk on all files on
> server backend 2 , so ls /mnt/vdisk send me 0 file size, but
> favorite-child resynchronize with correct content.
> To summarize
> if i reinstall from zero a new server and in my conf client file, this
> server appears as the first declare in afr subvolume, it can't be
> synchronize with the second server.
> Regards,
> Nicolas Prochazka.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel

More information about the Gluster-devel mailing list