[Gluster-users] Options to turn off/on for reliable virtual machine writes & write performance

Sun Oct 6 07:40:19 UTC 2013

Hi all, I'm building a cluster to host virtual machines to ESXi hosts (using NFS). The point of the cluster is that it should survive an unclean node death (test scenario by hard removing disks or cutting power, etc...), by which I need to make sure all writes are completed on both nodes before gluster returns the operation as completed. For now, I have this: 

gluster> volume info ha-ds1 
Volume Name: ha-ds1 
Type: Replicate 
Volume ID: da2fb668-2f3e-4839-a5da-4a51d5fcba05 
Status: Started 
Number of Bricks: 1 x 2 = 2 
Transport-type: tcp 
Bricks: 
Brick1: 10.255.255.1:/vol/gluster/ha-ds1 
Brick2: 10.255.255.2:/vol/gluster/ha-ds1 
Options Reconfigured: 
cluster.self-heal-daemon: on 
performance.flush-behind: Off 
network.frame-timeout: 30 
network.ping-timeout: 15 
cluster.heal-timeout: 300 
gluster> volume status all detail 
Status of volume: ha-ds1 
------------------------------------------------------------------------------ 
Brick : Brick 10.255.255.1:/vol/gluster/ha-ds1 
Port : 49153 
Online : Y 
Pid : 2252 
File System : ext4 
Device : /dev/mapper/stor--node1-gluster 
Mount Options : rw,noatime,nodiratime,journal_checksum,data=journal,errors=panic,nodelalloc 
Inode Size : 256 
Disk Space Free : 219.3GB 
Total Disk Space : 269.1GB 
Inode Count : 17924096 
Free Inodes : 17923263 
------------------------------------------------------------------------------ 
Brick : Brick 10.255.255.2:/vol/gluster/ha-ds1 
Port : 49152 
Online : Y 
Pid : 2319 
File System : ext4 
Device : /dev/mapper/stor--node2-gluster 
Mount Options : rw,noatime,nodiratime,journal_checksum,data=journal,errors=panic,nodelalloc 
Inode Size : 256 
Disk Space Free : 221.3GB 
Total Disk Space : 269.1GB 
Inode Count : 17924096 
Free Inodes : 17923162 
gluster> 

(I would also like to grab your attention to the mount options - are those OK or can I do better?) 

Is this enough to garantuee a proper cluster failover (data is consistent at all times) to the second node without interruption to the virtual machines? In my testing it appears to be, but I want to make sure - maybe someone else has something to add or something to look out for? 

Second, I'd like to improve the write performance of this cluster. Reads are good (> 110 MB/s, the ESXi servers are connected via gigabit so that'll be the maximum) but writes are only half that (~60 MB/s). The hardware can definitely do more - a simple dd 16 GB filewrite to the underlying filesystem nets ~227 MB/s. I gathered some statistics during sequential write tests, I see the load going to ~15 and some CPU usage but it looks like one CPU core is spendings its majority in IO wait. I know the hardware can perform better - are there any other places I should start looking? 

Thanks,

Glenn