[Gluster-users] Slow NFS performance with Replication

Mon Jun 30 17:44:49 UTC 2014

Hi all,

I have been experimenting with using gluster as a VM storage backend on
VMWare ESXi.  We are using Gluster NFS to share out storage to VMware
ESXi.  Our current setup includes 2 storage servers, in a 1x2 replication
pool, each with approximately 16TB of storage shared via gluster.  The NFS
servers are connected via 10Gbps NICs to the ESXi systems, and we've
dedicated a cross connected link for gluster replication between the
storage servers.

After some initial testing, we are only getting approximately 160-200MBps
on write speeds.  If we drop a brick from the volume, so replication does
not take place, we start seeing writes on the order of 500-600MBps.  We
would expect the writes to be in the 500MBps range with replication turned
on, however we are seeing less than half of that over 10Gbps links.

We also notice that write heavy VMs start IO waiting quite a bit with
replication turned on.

We have increased thread counts with the performance.* variables, but that
has not improved our situation.  When taking VMWare out of the equation
(by mounting directly with an NFS client on a different server), we see
the same results.  

Is this normal speed for 10Gbps interconnects with a replicate volume?

Here is our current gluster config.  We are running gluster 3.5.0:

Volume Name: gvol0
Type: Replicate
Volume ID: e88afc1c-50d3-4e2e-b540-4c2979219d12
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs0g:/data/brick0/gvol0
Brick2: nfs1g:/data/brick0/gvol0
Options Reconfigured:
nfs.disable: 0
network.ping-timeout: 3

nfs.drc off

----------
Brent Kolasinski
Computer Systems Engineer

Argonne National Laboratory
Decision and Information Sciences
ARM Climate Research Facility