[Gluster-users] gluster client performance

Mon Jul 25 22:12:52 UTC 2011

Hi-

I'm new to Gluster, but am trying to get it set up on a new compute 
cluster we're building. We picked Gluster for one of our cluster file 
systems (we're also using Lustre for fast scratch space), but the 
Gluster performance has been so bad that I think maybe we have a 
configuration problem -- perhaps we're missing a tuning parameter that 
would help, but I can't find anything in the Gluster documentation -- 
all the tuning info I've found seems geared toward Gluster 2.x.

For some background, our compute cluster has 64 compute nodes. The 
gluster storage pool has 10 Dell PowerEdge R515 servers, each with 12 x 
2 TB disks. We have another 16 Dell PowerEdge R515s used as Lustre 
storage servers. The compute and storage nodes are all connected via QDR 
Infiniband. Both Gluster and Lustre are set to use RDMA over Infiniband. 
We are using OFED version 1.5.2-20101219, Gluster 3.2.2 and CentOS 5.5 
on both the compute and storage nodes.

Oddly, it seems like there's some sort of bottleneck on the client side 
-- for example, we're only seeing about 50 MB/s write throughput from a 
single compute node when writing a 10GB file. But, if we run multiple 
simultaneous writes from multiple compute nodes to the same Gluster 
volume, we get 50 MB/s from each compute node. However, running multiple 
writes from the same compute node does not increase throughput. The 
compute nodes have 48 cores and 128 GB RAM, so I don't think the issue 
is with the compute node hardware.

With Lustre, on the same hardware, with the same version of OFED, we're 
seeing write throughput on that same 10 GB file as follows: 476 MB/s 
single stream write from a single compute node and aggregate performance 
of more like 2.4 GB/s if we run simultaneous writes. That leads me to 
believe that we don't have a problem with RDMA, otherwise Lustre, which 
is also using RDMA, should be similarly affected.

We have tried both xfs and ext4 for the backend file system on the 
Gluster storage nodes (we're currently using ext4). We went with 
distributed (not distributed striped) for the Gluster volume -- the 
thought was that if there was a catastrophic failure of one of the 
storage nodes, we'd only lose the data on that node; presumably with 
distributed striped you'd lose any data striped across that volume, 
unless I have misinterpreted the documentation.

So ... what's expected/normal throughput for Gluster over QDR IB to a 
relatively large storage pool (10 servers / 120 disks)? Does anyone have 
suggested tuning tips for improving performance?

Thanks!

John

-- 

________________________________________________________

John Lalande
University of Wisconsin-Madison
Space Science & Engineering Center
1225 W. Dayton Street, Room 439, Madison, WI 53706
608-263-2268 / john.lalande at ssec.wisc.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 9431 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110725/a2058196/attachment.bin>