[Gluster-users] Recommendations for busy static web server replacement

John Mark Walker johnmark at redhat.com
Tue Feb 7 17:56:57 UTC 2012


Brian - thank you for sharing these configuration tips. I'd love to have that in a blog post :)

As a close second, perhaps you could post a mini Q&A on community.gluster.org? This is the type of information that's very useful for google to index and make available.

Thanks,
JM



----- Original Message -----
> On Tue, Feb 07, 2012 at 09:59:44AM +0100, Carsten Aulbert wrote:
> > (1) two servers with raid0 over all 12 disks, each serving as a
> > single storage
> > brick in simple replicated setup.
> 
> I am doing some similar tests at the moment.
> 
> 1. What's your stripe size? If your files are typically 4MB, then
> using a
> 4MB or larger stripe size will mean that most requests are serviced
> from a
> single disk.  This will give higher latency for a single client but
> leave
> lots of spindles free for other concurrent clients, maximising your
> total
> throughput.
> 
> If you have a stripe size of 1MB, then each file read will need to
> seek on 4
> disks.  This gives you longer rotational latency (on average close to
> a full
> rotation instead of 1/2 a rotation), but 1/4 of the transfer time.
>  This
> might be a good tradeoff for single clients, but could reduce your
> total
> throughput with many concurrent clients.
> 
> Anything smaller is likely to suck.
> 
> 2. Have you tried RAID10 in "far" mode? e.g.
> 
> mdadm --create /dev/md/raid10 -n 12 -c 4096 -l raid10 -p f2 -b
> internal /dev/sd{h..s}
> 
> The advantage here is that all the data can be read off the first
> half of
> each disk, which means shorter seek times and also higher transfer
> rates
> (the MB/sec at the outside of the disk is about twice the MB/sec at
> the
> centre of the disk)
> 
> The downside is more seeking for writes, which may or may not pay off
> with
> your 3:1 ratio. As long as there is write-behind going on, I think it
> may.
> 
> Since each node has RAID10 disk protection then you could use a
> simple
> distributed setup on top of it (at the cost of losing the ability to
> take
> a whole storage node out of service). Or you could have twice as many
> disks.
> 
> 3. When you mount your XFS filesystems, do you provide the 'inode64'
> mount
> option?  This can be critical for filesystems >1TB to get decent
> performance, as I found out the hard way.
> http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_inode64_mount_option_for.3F
> 
> "noatime" and "nodiratime" can be helpful too.
> 
> 4. Have you tuned read_ahead_kb and max_sectors_kb? On my system
> defaults
> are 128 and 512 respectively.
> 
> for i in /sys/block/sd*/bdi/read_ahead_kb; do echo 1024 >"$i"; done
> for i in /sys/block/sd*/queue/max_sectors_kb; do echo 1024 >"$i";
> done
> 
> 5. Have you tried apache or apache2 instead of nginx? Have you done
> any
> testing directly on the mount point, not using a web server?
> 
> > Ideally, I'd like to have a set-up, where multiple relatively cheap
> > computers
> > with say 4 disks each run in raid0 or raid 10 or no raid and export
> > this via
> > glusterfs to our web server. Gluster's replication will serve as
> > kind of fail-
> > safe net and data redistribution will help, when we add more
> > similar machines
> > later on to counter increased usage.
> 
> I am currently building a similar test rig to yours, but with 24 disk
> bays
> per 4U server.  There are two LSI HBAs, one 16 port and one 8 port.
> 
> The HBAs are not the bottleneck (I can dd data to and from all the
> disks at
> once no problem), and the CPUs are never very busy.  One box has an
> i3-2130
> 3.4GHz processor (dual core hyperthreaded), and the other a Xeon
> E3-1225
> 3.1GHz (quad core, no hyperthreading)
> 
> We're going this way because we need tons of storage packed into a
> rack in a
> constrained power budget, but you might also find that fewer big
> servers are
> better than lots of tiny ones. I'd consider at least 2U with 12
> hot-swap
> bays.
> 
> I have yet to finish my testing, but here are two relevant results:
> 
> (1) with a single 12-disk RAID10 array with 1MB chunk size, shared
> using
> glusterfs over 10GE to another machine, serving files between 500k
> and 800k,
> from the client I can read 180 random files per second (117MB/s) with
> 20
> concurrent processes, or 206 random files per second (134MB/s) with
> 30
> concurrent processes.
> 
> For comparison, direct local access to the filesystem on the RAID10
> array
> gives 291 files/sec (189MB/sec) and 337 files/sec (219MB/sec) with 20
> or 30
> concurrent readers.
> 
> However, the gluster performance at 1/2/5 concurrent readers tracks
> the
> direct RAID10 closely, but falls off above that.  So I think there
> may be
> some gluster concurrency tuning required.
> 
> (2) in another configuration, I have 6 disks in one server and 6 in
> the
> other, with twelve separate XFS filesystems, combined into a
> distributed
> replicated array (much like yours but with half the spindles).  The
> gluster
> volume is mounted on one of the servers, which is where I run the
> test, so 6
> disks are local and 6 are remote.  Serving the same corpus of files I
> can
> read 177 random files per second (115MB/s) with 20 concurrent
> readers, or
> 198 files/sec (129MB/s) with 30 concurrent readers.
> 
> The corpus is 100K files, so about 65GB in total, and the machines
> have 8GB
> RAM.  Each test drops caches first: http://linux-mm.org/Drop_Caches
> 
> I have no web server layer in front of this - I'm using a ruby script
> which
> forks and fires off 'dd' processes to read the files from the gluster
> mountpoint.
> 
> However I am using low performance 5940 RPM drives (Hitachi Deskstar
> 5K3000
> HDS5C3030ALA630) because they are cheap, use little power, and are
> reputedly
> very reliable.  If you're using anything better than these you should
> be
> able to improve on my numbers.
> 
> I haven't compared to NFS, which might be an option for you if you
> can live
> without the node-to-node replication features of glusterfs.
> 
> Regards,
> 
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 



More information about the Gluster-users mailing list