[Gluster-users] Trying XenServer again with Gluster

Russell Purinton russell.purinton at gmail.com
Sun Mar 20 18:41:57 UTC 2016

Hi all,  Once again I’m trying to get XenServer working reliably with GlusterFS storage for the VHDs.  I’m mainly interested in the ability to have a pair of storage servers, where if one goes down, the VMs can keep running uninterrupted on the other server.  So, we’ll be using the replicate translator to make sure all the data resides on both servers.

So initially, I tried using the Gluster NFS server.  XenServer supports NFS out of the box, so this seemed like a good way to go without having to hack XenServer much.  I found some major performance issues with this however.  

I’m using a server with 12 SAS drives on a single RAID card, with dual 10GbE NICs.    Without Gluster, using the normal Kernel NFS server, I can read and write to this server at over 400MB/sec.  VMS run well.   However when I switch to Gluster for the NFS server, my write performance drops to 20MB/sec.  Read performance remains high.   I found out this is due to XenServer’s use of O_DIRECT for VHD access.  It helped a lot when the server had DDR cache on the RAID card, but for servers without that the performance was unusable.

So I installed the gluster-client in XenServer itself, and mounted the volume in dom0.  I then created a SR of type “file”.   Success, sort of!   I can do just about everything on that SR, VMs run nicely, and performance is acceptable at 270MB/sec, BUT….    I have a problem when I transfer an existing VM to it.  The transfer gets only so far along then data stops moving.  XenServer still says it’s copying, but no data is being sent.  I have to force restart the XenHost to clear the issue (and the VM isn’t moved).   Other file access to the FUSE mount still works, and other VMs are unaffected.

I think the problem may possibly involve file locks or perhaps a performance translator.  I’ve tried disabling as many performance translators as I can, but no luck.

I didn’t find anything interesting in the logs, and no crash dumps.   I tried to do a volume statedump to see the list of locks, but it seemed to only output some cpu stats in /tmp.

Is there a generally accepted list of volume options to use with Gluster for volumes meant to store VHDs?  Has anyone else had a similar experience with VHD access locking up?   

