[Gluster-users] XFS and MD RAID
landman at scalableinformatics.com
Wed Aug 29 13:06:28 UTC 2012
On 08/29/2012 03:48 AM, Brian Candler wrote:
> Does anyone have any experience running gluster with XFS and MD RAID as the
> backend, and/or LSI HBAs, especially bad experience?
Its pretty solid as long as your hardware/drivers/kernel revs are solid.
And this requires updated firmware. We've found modern LSI HBA and
RAID gear have had issues with occasional "events" that seem to be more
firmware bugs or driver bugs than anything else. The gear is stable for
very light usage, but when pushed hard (without driver/fw updates), it
does crash, hard, often with corruption.
> In a test setup (Ubuntu 12.04, gluster 3.3.0, 24 x SATA HD on LSI Megaraid
> controllers, MD RAID) I can cause XFS corruption just by throwing some
> bonnie++ load at the array - locally without gluster. This happens within
> hours. The same test run over a week doesn't corrupt with ext4.
Which kernel? I can't say I've ever seen XFS corruption from light use.
It usually takes some significant failure of some sort to cause this.
Iffy driver, bad disk, etc.
The ext4 comparison might not be apt. Ext4 isn't designed for parallel
IO workloads, while xfs is. Chances are you are tickling a
driver/kernel bug with the higher amount of work being done in xfs
> I've just been bitten by this in production too on a gluster brick I hadn't
> converted to ext4. I have the details I can post separately if you wish,
> but the main symptoms were XFS timeout errors and stack traces in dmesg, and
> xfs corruption (requiring a reboot and xfs_repair showing lots of errors,
> almost certainly some data loss).
> However, this leaves me with some unpalatable conclusions and I'm not sure
> where to go from here.
> (1) XFS is a shonky filesystem, at least in the version supplied in Ubuntu
> kernels. This seems unlikely given its pedigree and the fact that it is
> heavily endorsed by Red Hat for their storage appliance.
Uh ... no. Its pretty much the best/only choice for large storage
systems out there. Almost 20 years old at this point, making its first
appearance in Irix in 1995 time frame or so, moving to Linux a few years
later. Its many things, but crappy ain't one of them.
> (2) Heavy write load in XFS is tickling a bug lower down in the stack
> (either MD RAID or LSI mpt2sas driver/firmware), but heavy write load in
> ext4 doesn't. This would have to be a gross error such as blocks queued for
> write being thrown away without being sent to the drive.
xfs is a parallel IO file system, ext4 is not. There is a very good
chance you are tickling a bug lower in the stack. Which LSI HBA or RAID
are you using? How have you set this up? What kernel rev, and whats the
> I guess this is plausible - perhaps the usage pattern of write barriers is
> different for example. However I don't want to point the finger there
> without direct evidence either. There are no block I/O error events logged
> in dmesg.
Its very different. XFS is pretty good about not corrupting things, the
file system shuts down if it detects that it is corrupt. So if the in
memory image of the current state at moment of sync is not matched by
whats on the platters/SSD chips, then chances are you have a problem in
> The only way I can think of pinning this down is to find out what's the
> smallest MD RAID array I can reproduce the problem with, then try to build a
> new system with a different controller card (as MD RAID + JBOD, and/or as a
> hardware RAID array)
This would be a good start.
> However while I try to see what I can do for that, I would be grateful for
> any other experience people have in this area.
We've had lots of problems with LSI drivers/FW before rev 11.x.y.z .
FWIW: We have siCluster storage customers with exactly these types of
designs with uptimes measurable in hundreds of days, using Gluster atop
XFS atop MD RAID on our units. We also have customers who tickle
obscure and hard to reproduce bugs, causing crashes. Its not frequent,
but it does happen. Not with the file system, but usually with the
network drivers or overloaded NFS servers.
> Many thanks,
> Gluster-users mailing list
> Gluster-users at gluster.org
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Gluster-users