[Gluster-users] XFS and MD RAID

Brian Candler B.Candler at pobox.com
Mon Sep 3 10:25:48 UTC 2012


On Wed, Aug 29, 2012 at 09:06:28AM -0400, Joe Landman wrote:
> We've found modern LSI
> HBA and RAID gear have had issues with occasional "events" that seem
> to be more firmware bugs or driver bugs than anything else.  The
> gear is stable for very light usage, but when pushed hard (without
> driver/fw updates), it does crash, hard, often with corruption.

That's what I was afraid of :-(

Last week I set about reproducing this problem again on some test boxes, and
most annoyingly, I have been unable to.  The test ran for about 5 days
before one of the (Seagate) hard drives had an I/O error over the weekend,
and XFS shut down as you said it would.

I've just moved the remaining drives to another box, but after an hour it
hasn't failed either.  These boxes are identical specs to the production
boxes.

The production ones may get their filesystems wiped soon anyway, in which
case I can try reproducing on the actual same boxes.

> xfs is a parallel IO file system, ext4 is not.  There is a very good
> chance you are tickling a bug lower in the stack.  Which LSI HBA or
> RAID are you using?

HBAs, one 8 port and one 16 port.

root at dev-storage2:~# ./sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 12.00.00.00 (2011.11.08) 
Copyright (c) 2008-2011 LSI Corporation. All rights reserved 

	Adapter Selected is a LSI SAS: SAS2116_1(B1) 

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS2116_1(B1)   12.00.00.00    0c.00.00.01    07.23.01.00     00:02:00:00
1  SAS2008(B2)     12.00.00.00    0c.00.00.05    07.23.01.00     00:03:00:00

	Finished Processing Commands Successfully.
	Exiting SAS2Flash.

> How have you set this up?

mdadm --create /dev/md/huge -n 24 -c 1024 -l raid0 /dev/sd{b..y}
mkfs.xfs -f -n size=16384 /dev/md/huge

> What kernel rev

ubuntu 12.04, stock kernel 3.2.0-26 (a bit behind on updates; 3.2.0-29 is
latest)

> and whats the
> 
> 	modinfo mpt2sas
> 	lspci
> 	uname -a
> 
> output?

root at dev-storage2:~# modinfo mpt2sas
filename:       /lib/modules/3.2.0-26-generic/kernel/drivers/scsi/mpt2sas/mpt2sas.ko
version:        10.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 2.0 Device Driver
author:         LSI Corporation <DL-MPTFusionLinux at lsi.com>
srcversion:     44529298D89618E1BA4A0EC
alias:          pci:v00001000d0000007Esv*sd*bc*sc*i*
alias:          pci:v00001000d0000006Esv*sd*bc*sc*i*
alias:          pci:v00001000d00000087sv*sd*bc*sc*i*
alias:          pci:v00001000d00000086sv*sd*bc*sc*i*
alias:          pci:v00001000d00000085sv*sd*bc*sc*i*
alias:          pci:v00001000d00000084sv*sd*bc*sc*i*
alias:          pci:v00001000d00000083sv*sd*bc*sc*i*
alias:          pci:v00001000d00000082sv*sd*bc*sc*i*
alias:          pci:v00001000d00000081sv*sd*bc*sc*i*
alias:          pci:v00001000d00000080sv*sd*bc*sc*i*
alias:          pci:v00001000d00000065sv*sd*bc*sc*i*
alias:          pci:v00001000d00000064sv*sd*bc*sc*i*
alias:          pci:v00001000d00000077sv*sd*bc*sc*i*
alias:          pci:v00001000d00000076sv*sd*bc*sc*i*
alias:          pci:v00001000d00000074sv*sd*bc*sc*i*
alias:          pci:v00001000d00000072sv*sd*bc*sc*i*
alias:          pci:v00001000d00000070sv*sd*bc*sc*i*
depends:        scsi_transport_sas,raid_class
intree:         Y
vermagic:       3.2.0-26-generic SMP mod_unload modversions 
parm:           logging_level: bits for enabling additional logging info (default=0)
parm:           max_sectors:max sectors, range 64 to 8192  default=8192 (ushort)
parm:           max_lun: max lun, default=16895  (int)
parm:           max_queue_depth: max controller queue depth  (int)
parm:           max_sgl_entries: max sg entries  (int)
parm:           msix_disable: disable msix routed interrupts (default=0) (int)
parm:           missing_delay: device missing delay , io missing delay (array of int)
parm:           mpt2sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0)
parm:           disable_discovery: disable discovery  (int)
parm:           diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int)

root at dev-storage2:~# lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C204 Chipset Family LPC Controller (rev 05)
00:1f.2 IDE interface: Intel Corporation 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
00:1f.5 IDE interface: Intel Corporation 6 Series/C200 Series Chipset Family 2 port SATA IDE Controller (rev 05)
01:00.0 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599EB 10 Gigabit TN Network Connection (rev 01)
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
04:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 02)
05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10)
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
08:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

root at dev-storage2:~# uname -a
Linux dev-storage2.example.com 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Anyway, many thanks for sharing your experience. This was definitely
reproducible before, I'll come back when I can reproduce it again :-(

Regards,

Brian.



More information about the Gluster-users mailing list