[Gluster-users] Gluster 3.3, brick crashed

Tue Jul 31 16:40:11 UTC 2012

On Tue, Jul 31, 2012 at 03:31:13PM +0200, Christian Wittwer wrote:
>    Thanks for your input. I checked dmesg, and that doesn't look good I
>    think.
...
>    [1511244.755144] EXT4-fs (sdb1): error count: 2
>    [1511244.761993] EXT4-fs (sdb1): initial error at 1343085498:
>    ext4_xattr_release_block:496

No, that's not good.

>    I checked the raid (bultin hw controller from Dell), and all the disks
>    are ok.

Using MegaCli? Or some other way?

As it happens I've just been sorting this out on a Dell. Here is a potted
summary:

(0) Bookmark these links

http://en.community.dell.com/techcenter/os-applications/w/wiki/linux-raid-and-storage.aspx
http://tools.rapidsoft.de/perc/

(1) Check your controller version

    dmesg | grep PERC
    lspci -v | grep LSI

If it's a PERC 5 or later, continue. (The following tested with PERC 6/i)

(2) Install MegaCli 8.x

Starting at http://www.lsi.com/support/Pages/download-search.aspx select
‘RAID Controllers’, ‘Megaraid SAS 9260–4i’, ‘All Asset Types’ and search.
(It doesn't matter if your controller is not 9260-4i)

Under Management Software and Tools you will find MegaCLI 5.3 (actually
8.04.07_MegaCLI.zip). Download it.

Inside that is CLI_Lin_8.04.07.zip. Inside that is MegaCliLin.zip. Inside
that are

 1588725  Defl:N  1587255   0%  05-17-11 09:57  dd81e0a7 Lib_Utils-1.00-09.noarch.rpm
 1514197  Defl:N  1496763   1%  05-28-12 12:36  8e5e2a64 MegaCli-8.04.07-1.noarch.rpm

If you are using Debian/Ubuntu, use ‘alien’ to convert these to .deb
packages: e.g.

    apt-get install alien
    alien --to-deb Lib_Utils-1.00-09.noarch.rpm 
    alien --to-deb MegaCli-8.04.07-1.noarch.rpm

Install them (dpkg -i *.deb)

(3) Choose /opt/MegaRAID/MegaCli/MegaCli64 if you are on 64-bit linux or
/opt/MegaRAID/MegaCli/MegaCli if on 32-bit.

Proceed as per the cheat sheet: e.g.

# All adapters
/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL | less
# Event log
/opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents -f events.log -aALL && less events.log
# All enclosures
/opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL | less
# Logical drives
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL | less
# Physical drives
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | less
# Battery status
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL | less

The event log is typically the most useful; scroll to the end and then read
backwards.  It will tell you if drives are misbehaving.

I haven't gotten around to trying the SNMP package.

>    Next step would be to do a fsck first I guess.

I think it's pretty unusual for an ext4 filesystem to become corrupted
without there being some underlying failure of the hardware.  It's not
impossible, however even if the drives are OK it could indicate some other
problem (e.g.  RAM corruption)

It's also worth checking the kernel you're running and if it has known
problems.  For example, I'm told that the initial release of 3.4.0 was very
dodgy.

Regards,

Brian.