[Gluster-users] Usage Case: just not getting the performance I was hoping for

Thu Mar 15 04:09:28 UTC 2012

All,

For our project, we bought 8 new Supermicro servers.  Each server is a 
quad-core Intel cpu with 2U chassis supporting 8 x 7200 RPM Sata drives. 
  To start out, we only populated 2 x 2TB enterprise drives in each 
server and added all 8 peers with their total of 16 drives as bricks to 
our gluster pool as distributed replicated (2).  The replica worked as 
follows:

   1.1 -> 2.1
   1.2 -> 2.2
   3.1 -> 4.1
   3.2 -> 4.2
   5.1 -> 6.1
   5.2 -> 6.2
   7.1 -> 8.1
   7.2 -> 8.2

Where "1.1" above represents server 1, brick 1, etc...

We set up 4 gigabit network ports in each server (2 on motherboard and 2 
as intel pro dual-nic PCI-express).  The network ports were bonded in 
Linux to the switch giving us 2 "bonded nics" in each server with 
theoretical 2Gbps throughput aggregate per bonded nic pair.  One network 
was the "san/nas" network for gluster to communicate and the other was 
the lan interface where Samba would run.

After tweaking settings the best we could, we were able to copy files 
from Mac and Win7 desktops across the network but only able to get 50-60 
MB/s transfer speeds tops when sending large files (> 2GB) to gluster. 
When copying a directory of small files, we get <= 1 MB/s performance!

My question is ... is this right?  Is this what I should expect from 
Gluster, or is there something we did wrong?  We aren't using super 
expensive equipment, granted, but I was really hoping for better 
performance than this given that raw drive speeds using dd show that we 
can write at 125+ MB/s to each "brick" 2TB disk.

Our network switch is a decent Layer 2 D-Link switch (actually, 2 of 
them stacked with 10Gb cable), and we are only using 1GbE nics rather 
than infiniband or 10 GbE in the servers.  Overall, we spent about 22K 
on servers where drives where more than 1/3 of that cost due to the 
Thailand flooding.

Me and my team have been tearing apart our entire network to try to see 
where the performance was lost.  We've questioned switches, cables, 
routers, firewalls, gluster, samba, and even things on the system like 
ram and motherboards, etc.

When using a single Win2008 server with Raid 10 on 4 drives, shared to 
the network with built-in CIFS, we get much better (near 2x) performance 
than this 8-server gluster setup using Samba for smb/cifs and a total of 
16 drives.

 From your real-world usage, what kind of performance are you getting 
from gluster?  Is what I'm seeing the best I can do, or do you think 
I've configured something wrong and need to continue working with it?

If I can't get gluster to work, our fail-back plan is to convert these 8 
servers into iSCSI targets and mount the storage onto a Win2008 head and 
continue sharing to the network as before.  Personally, I would rather 
us continue moving toward CentOS 6.2 with Samba and Gluster, but I can't 
justify the change unless I can deliver the performance.

What are your thoughts?

-- Dante

D. Dante Lorenso
dante at lorenso.com