[Gluster-devel] Harddisk economy alternatives

Wed Nov 9 19:50:45 UTC 2011

On 09/11/2011 19:09, Magnus Näslund wrote:
> On 11/09/2011 06:51 PM, Gordan Bobic wrote:
>>
>> My main concern with such data volumes would be the error rates of
>> modern disks. If your FS doesn't have automatic checking and block level
>> checksums, you will suffer data corruption, silent or otherwise. Quality
>> of modern disks is pretty appaling these days. One of my experiences is
>> here:
>> http://www.altechnative.net/?p=120
>> but it is by no means the only one.
>>
>
> Interesting read, and I agree that raid data corruption and hard disk
> untrustworthiness issues being a huge problem. To combat this we're
> thinking of using a crude health checking utility that would use
> checksum files, on top of whatever we end up using (glusterfs or
> otherwise). These scripts would be specific to our application, and file
> based.
>
> In glusterfs I believe that it would be possible to do the checksum
> checking locally on the nodes, since the underlying filesystem is
> accessible?

In some cases. If you are using striping, then that gets potentially 
tricky. If you are using straight mirroring, then yes, you could easily 
just check the underlying files. However, that would mean manually 
correcting things. ZFS will check this for you every time a file is 
accessed or scrubbed and auto-correct the corrupted blocks, so no 
corrupted data ever gets served.

>> Currently the only FS that meets all of my reliability criteria is ZFS
>> (and the linux port works quite well now), and it has saved me from data
>> corruption, silent and otherwise, a number of times by now, in cases
>> where normal RAID wouldn't have helped.
>>
>
> We're using OpenSolaris+ZFS today in production, if glusterfs works well
> on OpenSolaris that might very well be what we end up with.

I have no idea whether glfs works well on Solaris. It's worth trying, 
but given how much effort Emanuel has put into porting it to (Net)BSD, 
it may well not "just work", but perhaps one of the developers will be 
able to clarify whether glfs is tested/supported/expected to work on 
Solaris.

> We're a linux-shop, but we settled for OpenSolaris on ZFS alone.
> Are you running glusterfs on Solaris or/and Linux in production?

On Linux.

You may be interested to have a look here:

http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/4d88218d6c8f67f0/78a7b633dd66157a?hl=en&lnk=gst&q=glusterfs#78a7b633dd66157a

>> Anyway, to summarize:
>> 1) With large volumes of data, you need something other than the disk's
>> sector checksums to keep your data correct, i.e. a checksum checking FS.
>> If you don't, expect to see silent data corruption sooner or later.
>
> The silent corruption case can be mitigated an application specific way
> for us, as described above. Having that automatically using ZFS is
> definately interesting in several ways. Does glusterfs have (or plan to
> have) a scrubbing-like functionality that checks the data?

I'd be interested to hear the developers' thoughts on this, but I think 
this would be extremely expensive. Doing "ls -laR" to check/auto-heal 
all files is already very time consuming on large file systems. 
Calculating md5s of all files as well would be orders of magnitude more 
expensive.

Gordan