[Gluster-users] Can Gluster handle a filesystem failure gracefully?

Mon Mar 26 08:10:06 UTC 2012

Hi All,

I've been playing with Gluster 3.3b2 over 2 nodes linked with 40Gbit Infiniband. Each server contains 12x 480GB SSD's with each one configured as a single brick with Gluster performing replication (replica=2). I've also setup an rsync of the data to some large SATA storage to maintain 3 replicas. The storage is used for VM images of application (Citrix) servers running on KVM hosts.

It was all working great until one of the SSD's started playing up and whilst I didn't lose any data, the drive failure was far from seamless. All the VM's located on the failed drive stopped responding after write failures and even when restarted some of the VM images remained locked until glusterd on both servers was restarted. The drive itself was throwing XFS write errors and I believe it was still blocking until I manually dismounted it.

Just wondering if there was a way of making this a little easier to recover from?
I'm currently using standard time outs which may be part of the problem. I'm trying not to resort to using RAID as it will either double my storage costs for the same amount of usable storage (raid10) or give me a 10th of the write IOPS (raid5).

Also, has granular locks been added to any of the Gluster 3.3 QA releases so I'm able to test it?

Thanks in advance
-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120326/9a62f753/attachment.html>