[Gluster-users] XFS and MD RAID

Brian Foster bfoster at redhat.com
Wed Aug 29 16:06:29 UTC 2012


On 08/29/2012 10:26 AM, Brian Candler wrote:
> On Wed, Aug 29, 2012 at 08:47:22AM -0400, Brian Foster wrote:
...
> 
> Running a couple of concurrent instances of
> 
>   while [ 1 ]; do bonnie++ -d /mnt/point -s 16384k -n 98:800k:500k:1000; done
> 
> was enough to make it fall over for me, when the underlying filesystem was
> XFS, but not with ext4 or btrfs.  This was on a system with 24 disks: 16 on
> an LSI 2116 controller and 8 on an LSI 2008.  It was MD RAID0:
> 
>   mdadm --create /dev/md/scratch -n 24 -c 1024 -l raid0 /dev/sd{b..y}
>   mkfs.xfs -n size=16384 /dev/md/scratch
>   mount -o inode64 /dev/md/scratch /mnt/point
> 

Thanks. I didn't see an obvious way to pass through physical disks in
the interface I have, but I set up a hardware raid0 and a couple
instances of bonnie. This may not be close enough to your workload, but
can't hurt to try.

> I'm in the process of testing with cut-down configuration of 8 or 4 disks,
> using only one controller card, to see if I can get it to fail there. So far
> no failure after 3 hours.
> 
>> Could you collect the generic data and post it to linux-xfs? Somebody
>> might be able to read further into the problem via the stack traces. It
>> also might be worth testing an upstream kernel on your server, if possible.
> 
> I posted the tracebacks to the xfs at sgi.com mailing list (wasn't aware of
> linux-xfs): threads starting
> http://oss.sgi.com/pipermail/xfs/2012-May/019239.html
> http://oss.sgi.com/pipermail/xfs/2012-May/019417.html
> 

That's the list I was referring to. ;)

> (Note: the box I referred to as "storage2" turned out to have a separate
> hardware problem, it resets after a few days)
> 
> Actually, thank you for reminding me of this. Looking back through these
> prior postings, I noted at one point that transfers had locked up to the
> point that even 'dd' couldn't read some blocks.  This points the finger away
> from XFS and MD RAID and more towards the LSI driver/firmware or the drives
> themselves.  I'm now using 12.04; when I next get a similar lockup I'll
> check for that again.
> 

I suppose so. That sounds like the suspicion so far from the feedback on
the xfs list as well.

I don't know that code well enough to readily decipher the situation
from the stack traces, but your last message refers to sysreq output in
the log at the timestamp of 250695, and I see the attached log only
going to 250145.

Brian

> Regards,
> 
> Brian.
> 




More information about the Gluster-users mailing list