[Gluster-users] Improving Gluster performance through more hardware.

Thu May 7 18:36:08 UTC 2015

Hi all.

First, I have a specific question about what hardware should be used for 
Gluster, then after that I have a question about how Gluster does its 
multithreading/hyperthreading.

So, we have a new Gluster cluster (currently, two servers with one 
"replicated" volume) serving up our files for e-mail, which has for 
years been stored in Maildir format. That works pretty well except for 
the few clients who store all their old mail on our server, and their 
"cur" folder contains a few tens of thousands of messages. As others 
have noticed, this isn't something that Gluster handles well. But we 
value high availability and redundancy more than we value fast, and we 
don't yet have a large enough cluster to justify going with software the 
requires a metadata server. So we're going with Gluster as a result of 
this. That doesn't mean we don't need better performance though.

So I've noticed that the resources that Gluster consumes the most in our 
use case isn't the network or disk utilization - both of which remain 
*well* under full utilization - but CPU cycles. I can easily test this 
by running `ls -l` in a folder with ~20,000 files in it, and I see CPU 
usage by glusterfsd jump to between 40-200%. The glusterfs process 
usually stays around 20-30%.

Both of our Gluster servers are gen III Dell 2950's with dual Xeon 
E5345's (quad-core, 2.33 GHz CPUs) in them, so we have 8 CPUs total to 
deal with this load. So far, we're only using a single mail server, but 
we'll be migrating to a load-balanced pair very soon. So my guess is 
that we can reduce the latency that's very noticeable in our webmail by 
upgrading to the fastest CPUs the 2950's can hold, evidently a 3.67 GHz 
quad-core.

It would be nice to know what other users have experienced with this 
kind of upgrade, or whether they've gotten better performance from other 
hardware upgrades.

Which leads to my second question. Does glusterfsd spawn multiple 
threads to handle other requests made of it? I don't see any evidence of 
this in the `top` program, but other clients don't notice at all that 
I'm running up the CPU usage with my one `ls` process. Smaller mail 
accounts can read their mail just as quickly as if the system were at 
near-idle while this operation is in progress. It's also hard for me to 
test this with only one mail server attached to the Gluster cluster. I 
can't tell if the additional load from 20 or 100 other servers makes any 
difference to CPU usage, but we want to know about what performance we 
can expect should we expand that far, and whether throwing more CPUs at 
the problem is the answer, or just throwing faster CPUs at the problem 
is what we will need to do in the future.