[Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)
Fernando Frediani (Qube)
fernando.frediani at qubenet.net
Fri Jun 8 17:57:21 UTC 2012
Thanks for sharing that Brian,
I wonder if the cause of the problem when trying to power Up VMware ESXi VMs is for the same reason.
Fernando
-----Original Message-----
From: Brian Candler [mailto:B.Candler at pobox.com]
Sent: 08 June 2012 17:47
To: Pranith Kumar Karampuri
Cc: olav johansen; gluster-users at gluster.org; Fernando Frediani (Qube)
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)
On Thu, Jun 07, 2012 at 02:36:26PM +0100, Brian Candler wrote:
> I'm interested in understanding this, especially the split-brain
> scenarios (better to understand them *before* you're stuck in a
> problem :-)
>
> BTW I'm in the process of building a 2-node 3.3 test cluster right now.
FYI, I have got KVM working with a glusterfs 3.3.0 replicated volume as the image store.
There are two nodes, both running as glusterfs storage and as KVM hosts.
I build a 10.04 ubuntu image using vmbuilder, stored on the replicated glusterfs volume:
vmbuilder kvm ubuntu --hostname lucidtest --mem 512 --debug --rootsize 20480 --dest /gluster/safe/images/lucidtest
I was able to fire it up (virsh start lucidtest), ssh into it, and then live-migrate it to another host:
brian at dev-storage1:~$ virsh migrate --live lucidtest qemu+ssh://dev-storage2/system
brian at dev-storage2's password:
brian at dev-storage1:~$ virsh list
Id Name State
----------------------------------
brian at dev-storage1:~$
And I live-migrated it back again, all without the ssh session being interrupted.
I then rebooted the second storage server. While it was rebooting I did some work in the VM which grew its image. When the second storage server came back, it resynchronised the image immediately and automatically. Here is the relevant entry from /var/log/glusterfs/glustershd.log on the first
(non-rebooted) machine:
[2012-06-08 17:08:40.817893] E [socket.c:1715:socket_connect_finish] 0-safe-client-1: connection to 10.0.1.2:24009 failed (Connection timed out)
[2012-06-08 17:09:10.698272] I [client-handshake.c:1636:select_server_supported_programs] 0-safe-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330)
[2012-06-08 17:09:10.700197] I [client-handshake.c:1433:client_setvolume_cbk] 0-safe-client-1: Connected to 10.0.1.2:24009, attached to remote volume '/disk/storage2/safe'.
[2012-06-08 17:09:10.700234] I [client-handshake.c:1445:client_setvolume_cbk] 0-safe-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-06-08 17:09:10.701901] I [client-handshake.c:453:client_set_lk_version_cbk] 0-safe-client-1: Server lk version = 1
[2012-06-08 17:09:14.699571] I [afr-common.c:1189:afr_detect_self_heal_by_iatt] 0-safe-replicate-0: size differs for <gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff>
[2012-06-08 17:09:14.699616] I [afr-common.c:1340:afr_launch_self_heal] 0-safe-replicate-0: background data self-heal triggered. path: <gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff>, reason: lookup detected pending operations
[2012-06-08 17:09:18.230855] I [afr-self-heal-algorithm.c:122:sh_loop_driver_done] 0-safe-replicate-0: diff self-heal on <gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff>: completed. (19 blocks of 3299 were different (0.58%))
[2012-06-08 17:09:18.232520] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-safe-replicate-0: background data self-heal completed on <gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff>
So at first glance this is extremely impressive. It's also very new and shiny, and I wonder how many edge cases remain to be debugged in live use, but I can't argue that it's very neat indeed!
Performance-wise:
(1) on the storage/VM host, which has the replicated volume mounted via FUSE:
root at dev-storage1:~# dd if=/dev/zero of=/gluster/safe/test.zeros bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.7086 s, 194 MB/s
(The bricks have a 12-disk md RAID10 array, far-2 layout, and there's probably scope for some performance tweaking here)
(2) however from within the VM guest, performance was very poor (2.2MB/s).
I tried my usual tuning options:
<driver name='qemu' type='qcow2' io='native' cache='none'/>
...
<target dev='vda' bus='virtio'/>
<!-- delete <address type='drive' controller='0' bus='0' unit='0'/> -->
but glusterfs objected to the cache='none' option (possibly this opens the file with O_DIRECT?)
# virsh start lucidtest
virsherror: Failed to start domain lucidtest
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/0
kvm: -drive file=/gluster/safe/images/lucidtest/tmpaJqTD9.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native: could not open disk image /gluster/safe/images/lucidtest/tmpaJqTD9.qcow2: Invalid argument
The VM boots with io='native' and bus='virtio', but performance is still very poor:
ubuntu at lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 17.4095 s, 6.0 MB/s
This will need some further work.
The guest is lucid (10.04) only because for some reason I cannot get a 12.04 image built with vmbuilder to work (it spins at 100% CPU). This is not related to glusterfs and something I need to debug separately. Maybe a
12.04 guest will also run better.
Anyway, just thought it was worth a mention. Keep up the good work guys!
Regards,
Brian.
More information about the Gluster-users
mailing list