[Gluster-users] Ext4 safe for production use with gluster?

Wed Jul 7 21:11:13 UTC 2010

On 7/7/10, Jeff Darcy <jdarcy at redhat.com> wrote:
> A bunch of ext4/xfs/etc. maintainers are in my group.  The "party line"
> is that ext4 can be made suitable for production use *if* you have all
> of the latest patches (not just ext4 itself but block layer etc.) *and*
> set the right options.  IIRC the default versions and options shipped
> with most distributions - including older versions of RHEL and Fedora -
> are probably not the ones you want.  The downside is that if you want
> greater data safety you pay for it in performance, and some of the
> performance regressions associated with switching to safer defaults have

Data safety is paramount from my POV. Clients usually can accept
network downtime and hardware failures. But corrupted/loss of data is
usually unacceptable. However, that said, they won't accept it if the
new "expensive" setup is slower than the existing or cannot handle at
least twice as much usage.

Normally I wouldn't care much about performance impact, +/- 20%
usually doesn't make a difference to the kind of loads I usually see
on the client servers. However, since I'm already virtualizing, that's
probably already 10~15% hit especially on IO side (hoping the Intel
VT-d would negate that), then increased latency since storage is on
network, hence I'm starting to be a bit concerned if I'm going to end
up with say 20% of local file performance.

> been widely discussed.  If XFS is an option for you, it might be worth
> considering because it balances these safety and performance needs a
> little better.  Otherwise, I'd recommend careful research and
> configuration of ext4, because these are the kinds of problems you
> probably won't catch in a synthetic testing environment and you really
> don't want to be debugging data-integrity problems just after the Big
> Power Hit.

I read up on XFS after your recommendation and it seems that ext4 is
based on xfs, which also has that delay allocation feature and
zero-file problem. Yet more reading says they sorta fixed that in xfs
and that ext3 actually does the same thing, just that it has a much
shorter 5 sec interval to flush so lose less data and no zero-out file
even if crashed.

I can't seem to find how long it takes XFS to flush, some xfs
documentation giving options says default meta data flush is 3.5 sec
but no clue about actual file data?

The sad thing is, the more I read up, the more worried I get and I
haven't got around to asking questions about fencing as well as
performance impact between many small files (exim in maildir) and
updates to single big file (mysql/postgres db).

Don't know if I'm worrying more than I should, I was sleeping easier
before knowing ext3 delays too :D