[Gluster-users] Gluster 3.3, brick crashed
Christian Wittwer
wittwerch at gmail.com
Thu Aug 2 11:23:11 UTC 2012
Hi Brian,
Thanks for your very detailed answer, awesome!
I checked the event log of the raid controller, unfortunately the log
was cleared on 31th july.
Time: Tue Jul 31 15:35:43 2012
Code: 0x0000001e
Class: 0
Locale: 0x20
Event Description: Event log cleared
I'm going to reboot the server. Maybe I can run some diagnose tool
during boot (to detect something like corrupted memory).
Cheers,
Christian
2012/7/31 Brian Candler <B.Candler at pobox.com>
>
> On Tue, Jul 31, 2012 at 03:31:13PM +0200, Christian Wittwer wrote:
> > Thanks for your input. I checked dmesg, and that doesn't look good I
> > think.
> ...
> > [1511244.755144] EXT4-fs (sdb1): error count: 2
> > [1511244.761993] EXT4-fs (sdb1): initial error at 1343085498:
> > ext4_xattr_release_block:496
>
> No, that's not good.
>
> > I checked the raid (bultin hw controller from Dell), and all the disks
> > are ok.
>
> Using MegaCli? Or some other way?
>
> As it happens I've just been sorting this out on a Dell. Here is a potted
> summary:
>
> (0) Bookmark these links
>
> http://en.community.dell.com/techcenter/os-applications/w/wiki/linux-raid-and-storage.aspx
> http://tools.rapidsoft.de/perc/
>
> (1) Check your controller version
>
> dmesg | grep PERC
> lspci -v | grep LSI
>
> If it's a PERC 5 or later, continue. (The following tested with PERC 6/i)
>
> (2) Install MegaCli 8.x
>
> Starting at http://www.lsi.com/support/Pages/download-search.aspx select
> ‘RAID Controllers’, ‘Megaraid SAS 9260–4i’, ‘All Asset Types’ and search.
> (It doesn't matter if your controller is not 9260-4i)
>
> Under Management Software and Tools you will find MegaCLI 5.3 (actually
> 8.04.07_MegaCLI.zip). Download it.
>
> Inside that is CLI_Lin_8.04.07.zip. Inside that is MegaCliLin.zip. Inside
> that are
>
> 1588725 Defl:N 1587255 0% 05-17-11 09:57 dd81e0a7 Lib_Utils-1.00-09.noarch.rpm
> 1514197 Defl:N 1496763 1% 05-28-12 12:36 8e5e2a64 MegaCli-8.04.07-1.noarch.rpm
>
> If you are using Debian/Ubuntu, use ‘alien’ to convert these to .deb
> packages: e.g.
>
> apt-get install alien
> alien --to-deb Lib_Utils-1.00-09.noarch.rpm
> alien --to-deb MegaCli-8.04.07-1.noarch.rpm
>
> Install them (dpkg -i *.deb)
>
> (3) Choose /opt/MegaRAID/MegaCli/MegaCli64 if you are on 64-bit linux or
> /opt/MegaRAID/MegaCli/MegaCli if on 32-bit.
>
> Proceed as per the cheat sheet: e.g.
>
> # All adapters
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL | less
> # Event log
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents -f events.log -aALL && less events.log
> # All enclosures
> /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL | less
> # Logical drives
> /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL | less
> # Physical drives
> /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | less
> # Battery status
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL | less
>
> The event log is typically the most useful; scroll to the end and then read
> backwards. It will tell you if drives are misbehaving.
>
> I haven't gotten around to trying the SNMP package.
>
> > Next step would be to do a fsck first I guess.
>
> I think it's pretty unusual for an ext4 filesystem to become corrupted
> without there being some underlying failure of the hardware. It's not
> impossible, however even if the drives are OK it could indicate some other
> problem (e.g. RAM corruption)
>
> It's also worth checking the kernel you're running and if it has known
> problems. For example, I'm told that the initial release of 3.4.0 was very
> dodgy.
>
> Regards,
>
> Brian.
More information about the Gluster-users
mailing list