[Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?
Jake Grimmett
jog at mrc-lmb.cam.ac.uk
Thu Jul 19 09:14:00 UTC 2012
Dear Pranith /Anand ,
Update on our progress with using KVM & Gluster:
We built a two server (Dell R710) cluster, each box has...
5 x 500 GB SATA RAID5 array (software raid)
an Intel 10GB ethernet HBA.
One box has 8GB RAM, the other 48GB
both have 2 x E5520 Xeon
Centos 6.3 installed
Gluster 3.3 installed from the rpm files on the gluster site
1) create a replicated gluster volume (on top of xfs)
2) setup qemu/kvm with a gluster volume (mounts localhost:/gluster-vol)
3) sanlock configured (this is evil!)
4) build a virtual machines with 30GB qcow2 image, 1GB RAM
5) clone this VM into 4 machines
6) check that live migration works (OK)
Start basic test cycle:
a) migrate all machines to host #1, then reboot host #2
b) watch logs for self-heal to complete
c) migrate VM's to host #2, reboot host #1
d) check logs for self heal
The above cycle can be repeated numerous times, and completes without
error, provided that no (or little) load is on the VM.
If I give the VM's a work load, such by running "bonnie++" on each VM,
things start to break.
1) it becomes almost impossible to log in to each VM
2) the kernel on each VM starts giving timeout errors
i.e. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
3) top / uptime on the hosts shows load average of up to 24
4) dd write speed (block size 1K) to gluster is around 3MB/s on the host
While I agree that running bonnie++ on four VM's is possibly unfair,
there are load spikes on quiet machines (yum updates etc). I suspect
that the I/O of one VM starts blocking that of another VM, and the
pressure builds up rapidly on gluster - which does not seem to cope well
under pressure. Possibly this is the access pattern / block size of
qcow2 disks?
I'm (slightly) disappointed.
Though it doesn't corrupt data, the I/O performance is < 1% of my
hardwares capability. Hopefully work on buffering and other tuning will
fix this ? Or maybe the work mentioned getting qemu talking directly to
gluster will fix this?
best wishes
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Hills Road, Cambridge, CB2 0QH, UK.
Phone 01223 402219
Mobile 0776 9886539
More information about the Gluster-users
mailing list