[Gluster-devel] GlusterFS Roadmap: Erasure codes.
gordan at bobich.net
gordan at bobich.net
Thu Apr 24 09:39:12 UTC 2008
On Thu, 24 Apr 2008, Rodney McDuff wrote:
> I was just looking at the GlusterFS Roadmap and thought that a nifty
> feature for the future would be an AFR-like translator that uses
> Reed-Solomon erasures codes instead for file replication. That would add
> many 9s to the reliability without adding much storage overhead.
You mean to have one file split across several servers like in RAID[3-6]?
That's a lot more complicated, and CPU intensive. It would also be quite
expensive on the writes across the servers, because each written block
would require all segments of it to be read first for the R-S checksums to
be calculated.
Fundamentally, I think it's just not what GlusterFS is intended to do. The
nice thing about GlusterFS is that it is very similar to Coda, only:
1) without the limitations that reduce Coda's usefulness (limit on 1MB of
metadata per directory - and metadata includes file names, which makes
Coda fundamentally unsuitable for things like Maildirs or any application
that is likely to see more than 1000-4000 files per directory)
2) without the features that get in the way of clean integration into
existing server clusters - e.g. Coda's permission system is ACL based
rather than POSIX, which is great for a truly global file system that Coda
is designed to be, but annoying for more tightly coupled clusters.
A reed-solomon n+m RAID type solution, on the other hand, is a lot less
flexible, although it does provide more effective storage per physical
storage. Total failure recovery is also a lot more difficult. With
GlusterFS, all the files are still there with their original content and
their original names. In Coda they have hash numbered names, and the real
names are in the metadata storage, but the content is still as per the
original file. All this makes recovery from extensive failure a lot more
sane (you'll know what I mean if you ever had to recover data from a RAID5
stripe in which 2 disks went bad).
But if storage space effectiveness is important to you, there are at least
two products/projects that provide this functionality already:
DDRAID
Network RAID 3.5, node number limited to powers of 2. Project seems to be
unmaintained.
http://sourceware.org/cluster/ddraid/
Cleversafe Dispersed Storage
Network RAIDn (n+m). Awesome idea, but sadly it's Java based (i.e.
bloatware with the performance of a snail on sedatives)
http://www.cleversafe.org/dispersed-storage
I hope this helps.
Gordan
More information about the Gluster-devel
mailing list