[Gluster-devel] Strange behavior of AFR

Wed May 21 13:22:57 UTC 2008

Hello everyone

I've noticed a strange behavior of AFR'ed files and I would like to
discuss it.

>From my point of view AFR is something like RAID1. When one node goes
down it continues to work using only one node. When failed node comes
up, it does a resync from worked node to failed in background. Mounted
clients should not see this operation.

In fact it works different. When failed node comes up, it starts to
resync. From client's point it looks like file becomes 0 bytes length
and then start growing up to the length it had before. When gluster
serves a lot of small files all the resync process is almost invisible.
When one uses glusterfs to host big files (20-40 Gb), the resync over
1Gbit LAN takes a few minutes.

Now imagine, we have a XEN based VPS farm. DomU's filesystems are
actually files with ext3 fs inside. These files are placed to glusterfs
storage with AFR between 2 servers. One dom0 node was rebooted for a
maintenance. All domUs were migrated from it to the second server before
rebooting, then migrated back. After a few minutes gluster started to
resync files in AFR and every domU system which tried to write something
to it's hdd found that hdd is actually missing and remounted fs in
read-only.

Is it correct behavior for AFR ? Is there any way to force resync
process without affecting domUs ? May be I needed to run:
head /path/to/domu.img > /dev/null
from the rebooted server before migrating domUs back to it ? Whouldn't
such enforcing be visible on all mounted clients ? I mean that when gfs
does resync, on all mount points file becomes 0 bytes length and then
start to growing back or it affects only the server where gfs was just
mounted ?

-- 
Best regards
Anton Khalikov