[Gluster-devel] Harddisk economy alternatives
Magnus Näslund
magnus at arkivdigital.se
Wed Nov 9 19:09:25 UTC 2011
On 11/09/2011 06:51 PM, Gordan Bobic wrote:
>
> My main concern with such data volumes would be the error rates of
> modern disks. If your FS doesn't have automatic checking and block level
> checksums, you will suffer data corruption, silent or otherwise. Quality
> of modern disks is pretty appaling these days. One of my experiences is
> here:
> http://www.altechnative.net/?p=120
> but it is by no means the only one.
>
Interesting read, and I agree that raid data corruption and hard disk
untrustworthiness issues being a huge problem. To combat this we're
thinking of using a crude health checking utility that would use
checksum files, on top of whatever we end up using (glusterfs or
otherwise). These scripts would be specific to our application, and file
based.
In glusterfs I believe that it would be possible to do the checksum
checking locally on the nodes, since the underlying filesystem is
accessible?
> Currently the only FS that meets all of my reliability criteria is ZFS
> (and the linux port works quite well now), and it has saved me from data
> corruption, silent and otherwise, a number of times by now, in cases
> where normal RAID wouldn't have helped.
>
We're using OpenSolaris+ZFS today in production, if glusterfs works well
on OpenSolaris that might very well be what we end up with.
We're a linux-shop, but we settled for OpenSolaris on ZFS alone.
Are you running glusterfs on Solaris or/and Linux in production?
>> So in my mind we could buy quality boxes with 24-36 disks run by 3-4
>> SATA controller cards (Marvell?),
>
> My experience with Marvell cards is limited. Do they have 8-port cards?
> I use 8-port LSI cards without any serious problems. The only issue I
> have seen is that they tend to reset the bus when the disk is slow to
> respond (specifically due to running a SMART self-test), which means
> that on one hand you lose the SMART short/long self-test option for
> monitoring, but this is mitigated by weekly ZFS scrubs which I trust
> more anyway.
We're using LSI cards now aswell for the solaris servers, IIRC.
We'd use the cards with the best reputation.
[snip]
> Anyway, to summarize:
> 1) With large volumes of data, you need something other than the disk's
> sector checksums to keep your data correct, i.e. a checksum checking FS.
> If you don't, expect to see silent data corruption sooner or later.
The silent corruption case can be mitigated an application specific way
for us, as described above. Having that automatically using ZFS is
definately interesting in several ways. Does glusterfs have (or plan to
have) a scrubbing-like functionality that checks the data?
> 2) Don't use the same make of disk in all the servers - I have seen
> multiple disks from the same manufacturer fail minutes apart more than
> once.
> 3) Use WRV features of they are available.
> 4) Make sure your glfs bricks are mirrored between machines in such a
> way that the underlying disks are different (e.g. say you have 24 disks
> in each box, divided into 3x 8-disk RAIDZ3 volumes. Use each one of
> those 8-disk volumes as a brick, and mirror it to a another similar
> machine so that the 8 disks on the other server are from a different
> manufacturer).
>
> The glfs part on top is relatively straightforward and will "just work"
> provided you use a reasonably sane configuration. It is the layers
> underneath that you will need to get right to keep your data healthy.
>
> Gordan
These are all excellent points.
Thank you for the input!
Regards,
Magnus
More information about the Gluster-devel
mailing list