[Gluster-devel] GlusterFS Spare Bricks?

Tue Apr 10 17:20:11 UTC 2012

-----Original Message-----
From: gluster-devel-bounces+7220022=gmail.com at nongnu.org
[mailto:gluster-devel-bounces+7220022=gmail.com at nongnu.org] On Behalf Of
Gordan Bobic
Sent: Tuesday, April 10, 2012 3:45 PM
To: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] GlusterFS Spare Bricks?

On 10/04/2012 09:39, 7220022 wrote:
> Are there plans to add provisioning of spare bricks in a replicated 
> (or
> distributed-replicated) configuration? E.g., when a brick in a mirror 
> set dies, the system rebuilds it automatically on a spare, similar to 
> how it'd done by RAID controllers.
>
> Nor would it only improve the practical reliability, especially of 
> large clusters, but it'd also make it possible to make 
> better-performing clusters off less expensive components. For example, 
> instead of having slow RAID5 bricks on expensive RAID controllers one 
> uses cheap HBA-s and stripes a few disks per brick in RAID0 - that's 
> faster for writes than RAID 5/6 by an order of magnitude (and, by the 
> way, should improve rebuild times in Gluster many are complaining 
> about.).A failure of one such striped brick is not catastrophic in a 
> mirrored Gluster - but it's better to have spare bricks standing by strewn
across cluster heads.
>
> A more advanced setup at a hardware level involves creating "hybrid 
> disks" whereas HDD vdisks are cached by enterprise-class SSD-s.It 
> works beautifully and makes HDD-s amazingly fast for random 
> transactions.The technology's become widely available for many $500 
> COTS controllers.However, it is not widely known that the results with 
> HDD-s in RAID0 under SSD cache are 10 to 20 (!!) times better than 
> with RAID 5 or 6.

On reads the difference should be negligible unless the array is degraded.
If it's not, your RAID controller is unfit for purpose.

[AS] I refer to random IOPS in 70K to 200K range  on vdisks in RAID 0 vs. 5
behind large SSD cache.  Behavior of such "hybrid vdisks" is different from
pure SSD or HDD-based ones.  Unlike that of the DDR RAM cache, the total R+W
bandwidth in MB/s of an SSD is limited at the level of its max. read-only
performance.  Hence the front-end read performance is degraded by the value
of the (sequential) write load onto the cache upstream from the HDD.  And
vice versa, the write performance of the hybrid gets degraded by the slow
write speed of a RAID 5/6 array behind cache - especially at larger queue
depths. These limitations, when superposed by most "real-world" test
patterns leave the array just marginally better for both writes and reads
than an HDD-based RAID10 one with the same number of drives.  Not quite sure
why, but it's removing the write speed limit of the HDD-s by changing the
RAID level from 5 to 0 that clears the bottleneck.  The relative difference
gets much higher for both reads and writes than the write performance gap
between pure HDD RAID 0 and 5 vdisks.

Having said that, a lot of RAID controllers are pretty useless.

[AS] the newer LSI 2208-based ones seem okay and recent firmware/drivers
finally stable.  But I agree: we always leave out RAID features apart from
stripe or mirror and do everything by software.  Advanced features
(FastPath, CacheCade) though are fantastic if you use SSD-s, either
standalone or as HDD cache.  In fact we use controllers instead of simple
HBA-s only to take advantage of these features.

> There is no way to use RAID0 in commercial storage, the main reason 
> being the absence of hot-spares.If on the other hand the spares are 
> handled by Gluster in a form of (cached hardware-RAID0) pre-fabricated 
> bricks both very good performance and reasonably sufficient redundancy 
> should be easily achieved.

So why not use ZFS instead? The write performance is significantly better
than traditional RAID equivalents and you get vastly more flexibility than
with any hardware RAID solution. And it supports caching data onto SSDs.
[AS] Good point.  We have no experience though, but we should try.  Do you
know if it can be made distributed "parallel" such as Gluster and supports
RDM transport for storage traffic between heads?  The main reason we've been
looking into Gluster is cheap bandwidth: all our servers and nodes are
connected via 40Gbit IB fabric, 2 ports per server, 4 on some larger ones,
non-blocking edge switches, directors at floor level etc - 80 to 90% idle.
Can you make global spares in ZFS?

Gordan

_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel