[Gluster-devel] GlusterFS Roadmap: Erasure codes.

gordan at bobich.net gordan at bobich.net
Thu Apr 24 09:39:12 UTC 2008


On Thu, 24 Apr 2008, Rodney McDuff wrote:

> I was just looking at the GlusterFS Roadmap and thought that a nifty
> feature for the future would be an AFR-like translator that uses
> Reed-Solomon erasures codes instead for file replication. That would add
> many 9s to the reliability without adding much storage overhead.

You mean to have one file split across several servers like in RAID[3-6]? 
That's a lot more complicated, and CPU intensive. It would also be quite 
expensive on the writes across the servers, because each written block 
would require all segments of it to be read first for the R-S checksums to 
be calculated.

Fundamentally, I think it's just not what GlusterFS is intended to do. The 
nice thing about GlusterFS is that it is very similar to Coda, only:

1) without the limitations that reduce Coda's usefulness (limit on 1MB of 
metadata per directory - and metadata includes file names, which makes 
Coda fundamentally unsuitable for things like Maildirs or any application 
that is likely to see more than 1000-4000 files per directory)

2) without the features that get in the way of clean integration into 
existing server clusters - e.g. Coda's permission system is ACL based 
rather than POSIX, which is great for a truly global file system that Coda 
is designed to be, but annoying for more tightly coupled clusters.

A reed-solomon n+m RAID type solution, on the other hand, is a lot less 
flexible, although it does provide more effective storage per physical 
storage. Total failure recovery is also a lot more difficult. With 
GlusterFS, all the files are still there with their original content and 
their original names. In Coda they have hash numbered names, and the real 
names are in the metadata storage, but the content is still as per the 
original file. All this makes recovery from extensive failure a lot more 
sane (you'll know what I mean if you ever had to recover data from a RAID5 
stripe in which 2 disks went bad).

But if storage space effectiveness is important to you, there are at least 
two products/projects that provide this functionality already:

DDRAID
Network RAID 3.5, node number limited to powers of 2. Project seems to be 
unmaintained.
http://sourceware.org/cluster/ddraid/

Cleversafe Dispersed Storage
Network RAIDn (n+m). Awesome idea, but sadly it's Java based (i.e. 
bloatware with the performance of a snail on sedatives)
http://www.cleversafe.org/dispersed-storage

I hope this helps.

Gordan





More information about the Gluster-devel mailing list