[Gluster-users] Geo-replicated disk images losing sparseness

Mon Jun 29 13:46:49 UTC 2015

Hi,

I have a Libvirt/KVM environment that uses three Gluster servers for
disk storage.  Two of the servers house the replicated volume with the
VM disk images and the third is offsite and used for geo-replication.
All of the hardware is the same - fairly high end (RAID10, SAS, 2x
Xeons), Centos 7, XFS and Gluster 3.6.2 from the Centos Storage Sig.
The VM images themselves are about 1.5TB of sparse files, that take up
around 430GB on disk.

After things were set up, I noticed that Geo-replication was taking
much longer than expected, even though the amount of on disk changes
was small and the link was 100 meg.  The primary bottleneck seems to
be rsync on both ends of the geo-replication, taking up CPU as it
checksums the disk images.  Several optimizations helped, such as
setting a high sync_jobs and changing rsync's compression-level, but
not enough for the geo-replication to keep up.

The disk images are losing sparseness when geo-replicated.  On both
replicas, the disk images take up 430GB, but on the geo-replicated one
they take up the full 1.5TB.  I've tried several different configs on
the servers and blown away/restarted the replication several times,
but they always turn out to be 1.5TB.  This results in the full 1.5TB
being read in and checksumed with each sync, which is very slow.
Worse, the two sides of the sync seem to take turns - rsync will chew
away on one of the replicas for a while with the geo-replication
server idle and then vice versa.

I notice that rsync is being called with the sparse flag (-S) and
--inplace.  The rsync manual says that the two are incompatible under
the --sparse section, but doesn't specify the behavior when they are
called together.

I'm not sure if the sparse issue is a bug or if I've missed something
in the configuration?

Thanks

-Brian