[Gluster-devel] GlusterFS Spare Bricks?
7220022
7220022 at gmail.com
Thu Apr 12 23:11:08 UTC 2012
http://www.google.ru/url?sa=t&rct=j&q=gluster%20virtual%20storage%20applianc
e%20infiniband%20ssd%20performance&source=web&cd=26&ved=0CFsQFjAFOBQ&url=htt
p%3A%2F%2Fwww.lighthouse-partners.com%2Flinux%2Fpresentations09%2FHPL09-Sess
ion6.pdf&ei=c12HT8WtForgtQa79Jy-BA&usg=AFQjCNFwOz2DTSWvSLQETiXtR2Qy-szOPA&ca
d=rjt
from page 27, discussion of storage redundancy issues - could be useful too.
-----Original Message-----
From: abperiasamy at gmail.com [mailto:abperiasamy at gmail.com] On Behalf Of
Anand Babu Periasamy
Sent: Thursday, April 12, 2012 7:56 PM
To: 7220022
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] GlusterFS Spare Bricks?
> -----Original Message-----
> From: abperiasamy at gmail.com [mailto:abperiasamy at gmail.com] On Behalf
> Of Anand Babu Periasamy
> Sent: Wednesday, April 11, 2012 10:13 AM
> To: 7220022
> Cc: gluster-devel at nongnu.org
> Subject: Re: [Gluster-devel] GlusterFS Spare Bricks?
>
> On Tue, Apr 10, 2012 at 1:39 AM, 7220022 <7220022 at gmail.com> wrote:
> >
> > Are there plans to add provisioning of spare bricks in a replicated
> > (or
> distributed-replicated) configuration? E.g., when a brick in a mirror
> set dies, the system rebuilds it automatically on a spare, similar to
> how it'd done by RAID controllers.
> >
> >
> >
> > Nor would it only improve the practical reliability, especially of
> > large
> clusters, but it'd also make it possible to make better-performing
> clusters off less expensive components. For example, instead of having
> slow RAID5 bricks on expensive RAID controllers one uses cheap HBA-s
> and stripes a few disks per brick in RAID0 - that's faster for writes
> than RAID 5/6 by an order of magnitude (and, by the way, should
> improve rebuild times in Gluster many are complaining about.). A
> failure of one such striped brick is not catastrophic in a mirrored
> Gluster - but it's better to have spare bricks standing by strewn
> across cluster heads.
> >
> >
> >
> > A more advanced setup at a hardware level involves creating "hybrid
> > disks"
> whereas HDD vdisks are cached by enterprise-class SSD-s. It works
> beautifully and makes HDD-s amazingly fast for random transactions.
> The technology's become widely available for many $500 COTS controllers.
> However, it is not widely known that the results with HDD-s in RAID0
> under SSD cache are 10 to 20 (!!) times better than with RAID 5 or 6.
> >
> >
> >
> > There is no way to use RAID0 in commercial storage, the main reason
> > being
> the absence of hot-spares. If on the other hand the spares are
> handled by Gluster in a form of (cached hardware-RAID0) pre-fabricated
> bricks both very good performance and reasonably sufficient redundancy
> should be easily achieved.
>
> Why not use "gluster volume replace-brick ..." command. You can use
> external monitoring/management tools (eg. freeipmi) to detect node
> failures and trigger replace brick through a script. GlusterFS has the
> mechanism for hot spare, but the policy should be external.
>
> [AS] That should work, but still it'd be prone to human error. In our
> experience, if we've not had hotspares (block storage) we'd have
> surely experienced catastrophic failures. First-off, COTS disks (and
> controllers, if we talk GlusterFS nodes) have a break-in period when
> the bad ones fail under load within a few months. Secondly, a lot of
> our equipment is in remote telco facilities where power, cleanliness
> or airconditioning can be far from ideal - leading to increasing
> failure rates about 2 years after deployment. As a rule, we have at
> least 4 hotspares per two 24-bay enclosures, while our sister company
> with similar use profile does 4-6 spares per enclosure, as they run
> older and less uniform equipment.
>
> A node may come back online in 5 mins, GlusterFS should not
> automatically make decisions.
> [AS] Good point, e.g. down for maintenance
>
> I am thinking if it makes sense to add hot-spare as a standard
> feature, because GlusterFS detects failures.
>
> [AS] Given the reason above it'd be best if the feature could be
> turned on and off. Before attempting maintenance - turn off.
> Maintenance complete and node up - the "turn hotspare on" command is
> issued, but it's queued until the reconstruction of the node begins -
> and takes it into consideration (won't attempt to sync to spare bricks
> in case reconstruction to other good bricks has already began).
>
> In half the cases, failed disks and controllers fail randomly and
> temporarily (due to dust, bad power etc.) Most of the time the root
> cause is unknown or is impractical to debug in a live system. Block
> storage SAN-s have more or less standard configuration tools that also
> take that into account. Here's a brief description in their
> terminology, which may help creating the logic in GlusterFS:
>
> 1. Drives can have the statuses of Online, Unconfigured Good,
> Unconfigured Bad, Spare (LSP, a spare local to the drive group,)
> Global Spare (GSP, across the system) and Foreign.
> 2. vDisks can be Optimal, Degraded and Degraded, Rebuilding 3. In
> presence of spares, if a drive in a redundant vDisk fails the system
> marks the drive as Unconfigured Bad and the vDisk picks up the spare
> and enters the Rebuilding mode.
> 4. The system won't let you make an Unconfigured Bad drive Online.
> But you can try a "make unconfigured good" command on it. if
> successful, and it passes initialization and it won't show trouble in
> SMART - include it in a new vDisk, make it a spare, etc. If it's bad
> - replace it.
>
Very useful points. Took notes.
-ab
More information about the Gluster-devel
mailing list