[Gluster-devel] Strange behavior of AFR

Wed May 21 13:58:28 UTC 2008

On Wed, 21 May 2008, Anton Khalikov wrote:

>> From my point of view AFR is something like RAID1. When one node goes
> down it continues to work using only one node. When failed node comes
> up, it does a resync from worked node to failed in background. Mounted
> clients should not see this operation.
>
> In fact it works different. When failed node comes up, it starts to
> resync. From client's point it looks like file becomes 0 bytes length
> and then start growing up to the length it had before. When gluster
> serves a lot of small files all the resync process is almost invisible.
> When one uses glusterfs to host big files (20-40 Gb), the resync over
> 1Gbit LAN takes a few minutes.

Yes, this has been raised before. There was a long thread here about 
possible ways to work around it with journalling so that only the deltas 
get transmitted for resync, or even a rsync type file sync for large 
files.

> Now imagine, we have a XEN based VPS farm. DomU's filesystems are
> actually files with ext3 fs inside. These files are placed to glusterfs
> storage with AFR between 2 servers.

Another thing that will likely be a stumbling block is that GlusterFS 
doesn't currently support sparse files. It will resync the virtual 
(as reported by ls) rather than actual size (which is typically much 
smaller unless the virtual disk is full).

> One dom0 node was rebooted for a
> maintenance. All domUs were migrated from it to the second server before
> rebooting, then migrated back. After a few minutes gluster started to
> resync files in AFR and every domU system which tried to write something
> to it's hdd found that hdd is actually missing and remounted fs in
> read-only.

This may be to do with tthe posix locking. Currently posix lock server is 
the first server in the AFR list (and the order of servers in AFR whould 
be the same on all nodes, or else the locking won't work properly). When 
the primary (lock) server goes away, all the locks disappear, too. There 
was also another thread there discussing lock / metadata distribution 
across the cluster with quorum locking. But that is also as yet 
unimplemented.

> Is it correct behavior for AFR ? Is there any way to force resync
> process without affecting domUs ? May be I needed to run:
> head /path/to/domu.img > /dev/null
> from the rebooted server before migrating domUs back to it ? Whouldn't
> such enforcing be visible on all mounted clients ?

If it's timeout related, then yes, head-ing the images will solve the 
problem. If it's lock related, it won't make any difference.

> I mean that when gfs
> does resync, on all mount points file becomes 0 bytes length and then
> start to growing back or it affects only the server where gfs was just
> mounted ?

Yes, it only affects the servers that have an out-of-date version of the 
file (or don't have the file at all). The currently online servers that 
are up to date will be showing the correct information.

For the particular use case you are describing, you may find that 
something like DRBD would fit better, with a separate DRBD device per 
VM.

Gordan