[Gluster-users] GlusterFS Hardware Recommendations

Andy Pace APace at singlehop.com
Thu Jun 17 23:09:17 UTC 2010


Hello all, new to the list here. Figured I'd hop in to get some insight/opinion on the hardware requirements of Gluster in our environment.

We've been testing Gluster as our shared storage technology for our new Cloud product we're going to be launching. The primary role of the Gluster infrastructure is to house several hundred (potentially thousands) of Xen sparse image files.

In our lab, we have the following servers/specs setup:

Bricks1-4:
Dual Xeon 5410 (8 cores)
16G RAM
4X1TB 7200RPM HDD @ 3ware RAID5
Bond0 - 2Gbps

Cloud1 (client)
Q8200 2.33GHz
8G RAM
1X1TB 7200RPM HDD
Eth0 - 1Gbps

It's all primarily SuperMicro chassis and motherboards, Seagate drives and standard RAM.

The Gluster client is configured to distribute and mirror across both pairs of bricks -a typical setup for users like us that need the scalability and redundancy. Until yesterday, things were working great. We were getting line speed writes (55MB/s, because gluster has to write to 2 bricks), and amazing reads. However, after running 5 concurrent iozone benchmarks, we noticed the bricks becoming loaded, primarily in CPU, and some IO:

Cloud1  saw peak of 92% cpu utilization with pdflush (buffer cache manager) reaching
up to 12% of cpu

top - 13:53:12 up 21:44,  2 users,  load average: 1.72, 1.28, 0.72
Tasks: 102 total,   2 running, 100 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.7%us, 10.1%sy,  0.0%ni, 80.6%id,  0.0%wa,  0.0%hi,  1.2%si,  0.5%st
Mem:   8120388k total,  8078832k used,    41556k free,    30760k buffers
Swap:  4200988k total,      752k used,  4200236k free,  7606560k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2304 root      20   0  245m 7032 1388 R   91  0.1   8:27.03 glusterfs
 4409 root      20   0 46812  16m  172 S    1  0.2   0:01.86 iozone
 1159 root      15  -5     0    0    0 S    1  0.0   0:00.12 kjournald
 4410 root      20   0 46812  16m  172 S    1  0.2   0:02.22 iozone
 4411 root      20   0 46812  16m  172 S    1  0.2   0:01.32 iozone
 4412 root      20   0 46812  16m  172 S    1  0.2   0:01.96 iozone
 4413 root      20   0 46812  16m  172 S    1  0.2   0:02.00 iozone
    1 root      20   0 10312  752  620 S    0  0.0   0:00.16 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0

Brick1 saw peak of 14% cpu utilization

top - 13:52:39 up 8 days,  5:10,  1 user,  load average: 0.07, 0.05, 0.01
Tasks: 151 total,   2 running, 149 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.7%sy,  0.0%ni, 98.1%id,  0.0%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  16439672k total,  2705356k used, 13734316k free,   228624k buffers
Swap:  8388600k total,      156k used,  8388444k free,  2255200k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3413 root      15   0  187m 9332 1172 R 10.0  0.1  25:28.51 glusterfsd
    1 root      15   0 10348  692  580 S  0.0  0.0   0:02.15 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0

brick2 saw peak of 37 % cpu utilization

top - 13:55:29 up 8 days,  5:54,  2 users,  load average: 0.21, 0.14, 0.05
Tasks: 152 total,   1 running, 151 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  1.1%sy,  0.0%ni, 97.2%id,  0.0%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  16439672k total,  2698776k used, 13740896k free,   257288k buffers
Swap:  8388600k total,      152k used,  8388448k free,  2251084k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3404 root      15   0  187m 4996 1156 S 16.6  0.0  36:50.78 glusterfsd
10547 root      15   0     0    0    0 S  1.0  0.0   0:11.93 pdflush
    1 root      15   0 10348  692  580 S  0.0  0.0   0:02.18 init

brick3

top - 21:44:22 up 7 days, 23:36,  3 users,  load average: 0.74, 0.39, 0.19
Tasks: 110 total,   2 running, 108 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.7%us,  4.1%sy,  0.0%ni, 89.0%id,  0.0%wa,  0.4%hi,  2.8%si,  0.0%st
Mem:  16439808k total, 14291352k used,  2148456k free,   191444k buffers
Swap:  8388600k total,      120k used,  8388480k free, 13721644k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2953 root      15   0  187m 8380 1176 R 34.3  0.1  12:46.74 glusterfsd
    1 root      15   0 10348  692  580 S  0.0  0.0   0:01.75 init

brick4

top - 13:56:29 up 8 days,  3:57,  2 users,  load average: 0.89, 0.62, 0.35
Tasks: 153 total,   1 running, 152 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.3%us,  2.9%sy,  0.0%ni, 92.8%id,  0.0%wa,  0.3%hi,  1.7%si,  0.0%st
Mem:  16439672k total, 14405528k used,  2034144k free,   193340k buffers
Swap:  8388600k total,      144k used,  8388456k free, 13782972k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3318 root      15   0  253m 7812 1168 S 44.6  0.0  16:36.88 glusterfsd
30901 root      15   0 12740 1108  800 R  0.3  0.0   0:00.01 top
    1 root      15   0 10348  692  580 S  0.0  0.0   0:02.32 init


Command line used: /usr/sbin/iozone -R -l 5 -u 5 -r 4k -s 2048m -F /distributed/f1 /distributed/f2 /distributed/f3 /distributed/f4 /distributed/f5 -G

This obviously is very concerning, that 1 client can induce this sort of load across our gluster infrastructure. So, I'm thinking we get 4 of the following:

Dual Xeon 5520
32G RAM (DDR3)
14X1TB 7200RPM HDD, 12 @ RAID10, 2 @ RAID1 (for the operating system)
MaxIQ or CacheCade SSD addon depending on whether we go with Adaptec or LSI
6X1Gige bonded to 6Gbps

Is this overkill? Here are some things to keep in mind:


*         The majority of our customers will be hosting web applications, blogs, forums, primarily database driven - lots of reads, some writes

*         Each client server only has a 1Gbps connection to the brick(s)

*         We obviously are trying to get the most bang for our buck, but not trying to spend $40k on something that's overpowered

Is that iozone benchmark too intense for what we're ACTUALLY going to be seeing from our Xen instances?
Is there an advantage to using more bricks at half the horsepower?
At 6Gbps to the bricks, does RAID10 make more sense, because of its exceptional performance?
Has anyone on the list setup something like this before? If so, mind sharing your wisdom?

Thanks in advance everyone :)

Just for reference, here are the iozone benchmarking results:

Run began: Wed Jun 16 13:48:41 2010

    Excel chart generation enabled
    Record Size 4 KB
    File size set to 2097152 KB
    Using msync(MS_SYNC) on mmap files
    Command line used: /usr/sbin/iozone -R -l 5 -u 5 -r 4k -s 2048m -F /distributed/f1 /distributed/f2 /distributed/f3 /distributed/f4 /distributed/f5 -G
    Output is in Kbytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 Kbytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
    Min process = 5
    Max process = 5
    Throughput test with 5 processes
    Each process writes a 2097152 Kbyte file in 4 Kbyte records

    Children see throughput for  5 initial writers     =   34936.65 KB/sec
    Parent sees throughput for  5 initial writers     =   31595.96 KB/sec
    Min throughput per process             =    6713.67 KB/sec
    Max throughput per process             =    7407.69 KB/sec
    Avg throughput per process             =    6987.33 KB/sec
    Min xfer                     = 1900672.00 KB

    Children see throughput for  5 rewriters     =   34336.49 KB/sec
    Parent sees throughput for  5 rewriters     =   33737.35 KB/sec
    Min throughput per process             =    6506.53 KB/sec
    Max throughput per process             =    7430.44 KB/sec
    Avg throughput per process             =    6867.30 KB/sec
    Min xfer                     = 1836416.00 KB

    Children see throughput for  5 readers         =  111977.31 KB/sec
    Parent sees throughput for  5 readers         =  111846.84 KB/sec
    Min throughput per process             =   20259.48 KB/sec
    Max throughput per process             =   25610.23 KB/sec
    Avg throughput per process             =   22395.46 KB/sec
    Min xfer                     = 1658992.00 KB

    Children see throughput for 5 re-readers     =  111582.38 KB/sec
    Parent sees throughput for 5 re-readers     =  111420.54 KB/sec
    Min throughput per process             =   20841.22 KB/sec
    Max throughput per process             =   25012.94 KB/sec
    Avg throughput per process             =   22316.48 KB/sec
    Min xfer                     = 1747440.00 KB

    Children see throughput for 5 reverse readers     =  110576.31 KB/sec
    Parent sees throughput for 5 reverse readers     =  110356.95 KB/sec
    Min throughput per process             =   18543.04 KB/sec
    Max throughput per process             =   23964.38 KB/sec
    Avg throughput per process             =   22115.26 KB/sec
    Min xfer                     = 1622784.00 KB

    Children see throughput for 5 stride readers     =    6513.98 KB/sec
    Parent sees throughput for 5 stride readers     =    6513.14 KB/sec
    Min throughput per process             =    1189.43 KB/sec
    Max throughput per process             =    1497.43 KB/sec
    Avg throughput per process             =    1302.80 KB/sec
    Min xfer                     = 1665796.00 KB

    Children see throughput for 5 random readers     =    3460.22 KB/sec
    Parent sees throughput for 5 random readers     =    3460.08 KB/sec
    Min throughput per process             =     622.45 KB/sec
    Max throughput per process             =     799.28 KB/sec
    Avg throughput per process             =     692.04 KB/sec
    Min xfer                     = 1633196.00 KB





More information about the Gluster-users mailing list