[Gluster-users] Gluster striping taking up too much space

Jeff Darcy jdarcy at redhat.com
Fri Jul 6 20:36:24 UTC 2012


On 07/06/2012 03:49 PM, Brian Candler wrote:
> On Fri, Jul 06, 2012 at 11:48:15AM -0700, Khawaja Shams wrote:
>>     Hello,
>>       We redid our setup with ext4 instead of xfs - and it seems to work
>>    fine. It sounds like there may be a bug in the integration of xfs +
>>    glusterfs.
> 
> It's XFS doing pre-allocation of space (i.e. it doesn't expect you to leave
> holes in the file, which is how glusterfs striping works). There's a mount
> option to disable preallocation.

What's going on is that with a 40-way stripe your first brick is writing 16KB
at offset 0, 16KB at offset 640K (16KB*40), 16KB at offset 1280K, etc.  XFS's
hyper-aggressive preallocation is filling each of those gaps, and then not
emptying them again even when you've clearly skipped over them, so you have
forty files mostly full of never-written (and never-asked-for) blocks.  That's
exactly the damage that the stripe-coalesce option was implemented to undo.
Instead of only writing only one out of every stripe-count chunks, it does the
necessary offset calculations to write all of its own chunks contiguously, not
only reducing space used but in many cases improving performance as well.

> Are you sure that striping is what you want, rather than simple
> distribution?  It's appropriate if you have a small number of huge files,
> but if you are throwing a large number of small to medium-sized files
> around, distribution may be better. Each file then sits on a single
> filesystem; if you lose one brick, at least you haven't lost all your files.

Well, with replication underneath that wouldn't be an issue, but I see that's
not the case here.  Still, this seems like an excessive use of striping, which
rarely seems to provide more than a modest performance benefit even for
single-stream I/O (because of the overhead splitting/recombining requests and
managing multiple connections).  Mostly striping is only used to allow files to
be larger than bricks, but with modern disk sizes that's rarely an issue in the
first place.  With 40 bricks, distribution is far more likely to be the better
approach.




More information about the Gluster-users mailing list