[Gluster-users] Gluster 3.3, brick crashed

Thu Aug 2 11:23:11 UTC 2012

Hi Brian,
Thanks for your very detailed answer, awesome!
I checked the event log of the raid controller,  unfortunately the log
was cleared on 31th july.

Time: Tue Jul 31 15:35:43 2012

Code: 0x0000001e
Class: 0
Locale: 0x20
Event Description: Event log cleared

I'm going to reboot the server. Maybe I can run some diagnose tool
during boot (to detect something like corrupted memory).

Cheers,
Christian

2012/7/31 Brian Candler <B.Candler at pobox.com>
>
> On Tue, Jul 31, 2012 at 03:31:13PM +0200, Christian Wittwer wrote:
> >    Thanks for your input. I checked dmesg, and that doesn't look good I
> >    think.
> ...
> >    [1511244.755144] EXT4-fs (sdb1): error count: 2
> >    [1511244.761993] EXT4-fs (sdb1): initial error at 1343085498:
> >    ext4_xattr_release_block:496
>
> No, that's not good.
>
> >    I checked the raid (bultin hw controller from Dell), and all the disks
> >    are ok.
>
> Using MegaCli? Or some other way?
>
> As it happens I've just been sorting this out on a Dell. Here is a potted
> summary:
>
> (0) Bookmark these links
>
> http://en.community.dell.com/techcenter/os-applications/w/wiki/linux-raid-and-storage.aspx
> http://tools.rapidsoft.de/perc/
>
> (1) Check your controller version
>
>     dmesg | grep PERC
>     lspci -v | grep LSI
>
> If it's a PERC 5 or later, continue. (The following tested with PERC 6/i)
>
> (2) Install MegaCli 8.x
>
> Starting at http://www.lsi.com/support/Pages/download-search.aspx select
> ‘RAID Controllers’, ‘Megaraid SAS 9260–4i’, ‘All Asset Types’ and search.
> (It doesn't matter if your controller is not 9260-4i)
>
> Under Management Software and Tools you will find MegaCLI 5.3 (actually
> 8.04.07_MegaCLI.zip). Download it.
>
> Inside that is CLI_Lin_8.04.07.zip. Inside that is MegaCliLin.zip. Inside
> that are
>
>  1588725  Defl:N  1587255   0%  05-17-11 09:57  dd81e0a7 Lib_Utils-1.00-09.noarch.rpm
>  1514197  Defl:N  1496763   1%  05-28-12 12:36  8e5e2a64 MegaCli-8.04.07-1.noarch.rpm
>
> If you are using Debian/Ubuntu, use ‘alien’ to convert these to .deb
> packages: e.g.
>
>     apt-get install alien
>     alien --to-deb Lib_Utils-1.00-09.noarch.rpm
>     alien --to-deb MegaCli-8.04.07-1.noarch.rpm
>
> Install them (dpkg -i *.deb)
>
> (3) Choose /opt/MegaRAID/MegaCli/MegaCli64 if you are on 64-bit linux or
> /opt/MegaRAID/MegaCli/MegaCli if on 32-bit.
>
> Proceed as per the cheat sheet: e.g.
>
> # All adapters
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL | less
> # Event log
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents -f events.log -aALL && less events.log
> # All enclosures
> /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL | less
> # Logical drives
> /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL | less
> # Physical drives
> /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | less
> # Battery status
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL | less
>
> The event log is typically the most useful; scroll to the end and then read
> backwards.  It will tell you if drives are misbehaving.
>
> I haven't gotten around to trying the SNMP package.
>
> >    Next step would be to do a fsck first I guess.
>
> I think it's pretty unusual for an ext4 filesystem to become corrupted
> without there being some underlying failure of the hardware.  It's not
> impossible, however even if the drives are OK it could indicate some other
> problem (e.g.  RAM corruption)
>
> It's also worth checking the kernel you're running and if it has known
> problems.  For example, I'm told that the initial release of 3.4.0 was very
> dodgy.
>
> Regards,
>
> Brian.