[Gluster-users] XFS and MD RAID

Wed Aug 29 13:06:28 UTC 2012

On 08/29/2012 03:48 AM, Brian Candler wrote:
> Does anyone have any experience running gluster with XFS and MD RAID as the

Lots

> backend, and/or LSI HBAs, especially bad experience?

Its pretty solid as long as your hardware/drivers/kernel revs are solid. 
  And this requires updated firmware.  We've found modern LSI HBA and 
RAID gear have had issues with occasional "events" that seem to be more 
firmware bugs or driver bugs than anything else.  The gear is stable for 
very light usage, but when pushed hard (without driver/fw updates), it 
does crash, hard, often with corruption.

>
> In a test setup (Ubuntu 12.04, gluster 3.3.0, 24 x SATA HD on LSI Megaraid
> controllers, MD RAID) I can cause XFS corruption just by throwing some
> bonnie++ load at the array - locally without gluster.  This happens within
> hours.  The same test run over a week doesn't corrupt with ext4.

Which kernel?  I can't say I've ever seen XFS corruption from light use. 
  It usually takes some significant failure of some sort to cause this. 
  Iffy driver, bad disk, etc.

The ext4 comparison might not be apt.  Ext4 isn't designed for parallel 
IO workloads, while xfs is.  Chances are you are tickling a 
driver/kernel bug with the higher amount of work being done in xfs 
versus ext4.

>
> I've just been bitten by this in production too on a gluster brick I hadn't
> converted to ext4.  I have the details I can post separately if you wish,
> but the main symptoms were XFS timeout errors and stack traces in dmesg, and
> xfs corruption (requiring a reboot and xfs_repair showing lots of errors,
> almost certainly some data loss).
>
> However, this leaves me with some unpalatable conclusions and I'm not sure
> where to go from here.
>
> (1) XFS is a shonky filesystem, at least in the version supplied in Ubuntu
> kernels.  This seems unlikely given its pedigree and the fact that it is
> heavily endorsed by Red Hat for their storage appliance.

Uh ... no.  Its pretty much the best/only choice for large storage 
systems out there.  Almost 20 years old at this point, making its first 
appearance in Irix in 1995 time frame or so, moving to Linux a few years 
later.  Its many things, but crappy ain't one of them.

>
> (2) Heavy write load in XFS is tickling a bug lower down in the stack
> (either MD RAID or LSI mpt2sas driver/firmware), but heavy write load in
> ext4 doesn't.  This would have to be a gross error such as blocks queued for
> write being thrown away without being sent to the drive.

xfs is a parallel IO file system, ext4 is not.  There is a very good 
chance you are tickling a bug lower in the stack.  Which LSI HBA or RAID 
are you using?  How have you set this up?  What kernel rev, and whats the

	modinfo mpt2sas
	lspci
	uname -a

output?

>
> I guess this is plausible - perhaps the usage pattern of write barriers is
> different for example.  However I don't want to point the finger there
> without direct evidence either.  There are no block I/O error events logged
> in dmesg.

Its very different.  XFS is pretty good about not corrupting things, the 
file system shuts down if it detects that it is corrupt.  So if the in 
memory image of the current state at moment of sync is not matched by 
whats on the platters/SSD chips, then chances are you have a problem in 
that pathway.

>
> The only way I can think of pinning this down is to find out what's the
> smallest MD RAID array I can reproduce the problem with, then try to build a
> new system with a different controller card (as MD RAID + JBOD, and/or as a
> hardware RAID array)

This would be a good start.

>
> However while I try to see what I can do for that, I would be grateful for
> any other experience people have in this area.

We've had lots of problems with LSI drivers/FW before rev 11.x.y.z .

FWIW:  We have siCluster storage customers with exactly these types of 
designs with uptimes measurable in hundreds of days, using Gluster atop 
XFS atop MD RAID on our units.   We also have customers who tickle 
obscure and hard to reproduce bugs, causing crashes.  Its not frequent, 
but it does happen.  Not with the file system, but usually with the 
network drivers or overloaded NFS servers.

>
> Many thanks,
>
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615