[Gluster-users] VMDK file replication issue
marco.trevisan at cardinis.com
Thu Aug 7 17:19:08 UTC 2008
I'm in the process of evaluating GlusterFS as a clustered file system, I
like it very much because -among the other cool features- it's very easy
to configure and it allows me to reuse the filesystems I already know as
Before trying it on expensive hardware, I decided to try it on a very
low HW configuration:
- 2 old PCs (one P4 class CPU, IDE drives, one 100 Mbps ethernet card)
and a 100 Mbps switch.
The OS is Debian 'lenny' in both nodes. 'Lenny' comes with FUSE v2.7.2.
I then compiled glusterfs 1.3.10 on both nodes and setup a server-side,
single-process AFR configuration (the file content is reported below).
I did NOT use the glusterfs-patched FUSE library.
On top of that I've put some VMWare Server virtual machines. Each
virtual machine image is split into a few 2 GB "vmdk" files (not
I was successful in starting up and running my virtual machines (with
only an additional line in their configuration files), so I was very
happy with it.
The problem now is, after putting a virtual machine under "intense"
I/O, when I rebooted it today I found its root filesystem (=the vmdk)
was corrupted. It lost some important directories (e.g. kernel modules
directory under /lib/modules).
Just to give you a little more detail of the behaviour under I/O, when
the virtual machine is doing I/O to the VMDK file, iptraf shows the
corresponding traffic on the ethernet link at about 15-50 Mbps, so it
looks like only the modified portions of the file are being sent to the
other AFR node, infact if I simulate a failure by powering off the other
AFR node, at reboot I see 90 Mbps (link saturation) traffic as I try to
open the VMDK file, and that operation blocks until full synchronization
The glusterfs.log content is as follows:
2008-08-06 12:20:24 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
2008-08-06 12:20:24 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
2008-08-06 12:42:25 E [posix-locks.c:1148:pl_lk] gfs-ds-locks: returning
2008-08-06 12:42:25 E [afr.c:3190:afr_lk_cbk] gfs-ds-afr:
2008-08-07 11:42:17 E [posix-locks.c:1180:pl_forget] gfs-ds-locks:
Active locks found!
The above log does not seem to justify such file corruption... there is
nothing related to "vmdk" files.
Is the HW configuration way too slow for afr to work reliably?
Are there mistakes in the configuration file?
Any help is really appreciated.
----------GlusterFS config file ----------------
# dataspace on storage1
option directory /mnt/hda7/gfs-ds
# posix locks
option thread-count 1
option cache-size 32MB
option transport-type tcp/server
# storage network access only
option auth.ip.gfs-ds-threads.allow *
option auth.ip.gfs-ds-afr.allow *
# dataspace on storage2
option transport-type tcp/client
option remote-host <the other node's IP> # storage network
option remote-subvolume gfs-ds-threads
option transport-timeout 10 # value in seconds; it should be
set relatively low
# automatic file replication translator for dataspace
subvolumes gfs-ds-locks gfs-storage2-ds # local and remote
option aggregate-size 128kB
option page-size 64kB
option page-count 16
More information about the Gluster-users