[Gluster-devel] Improving real world performance by moving files closer to their target workloads
derek at ximbiot.com
Mon Jun 2 17:17:04 UTC 2008
Martin Fick wrote:
> Except that as I pointed out in my other email, I do
> not think that this would actually make reading any
> better than today's caching translator, and if so,
> then simply improving the caching translator should
> suffice. And, it would make writes much more
> complicated and in fact probably slower than what
> unify does today. So where is the gain?
Hrm. Rereading your comments above, only #2 below is directly relevant
to cache efficiency, if that is all you are interested in, but this
design would have some other advantages, listed below. Why do you think
this model would slow down writes?
1. By being able to define a minimum redundancy level (let's call it,
"R") instead of strict (and exactly sized) mirrors, you get to extend
disk space on your mirrors. i.e., adding a new AFR disk on an existing
(or new) server adds 1/R of its space to the total available space on
the array. Under the current model, this would require replacing the
disks on all R AFR mirrors with disks 1/R bigger than previously
available (or add R disk of 1/R the space you wished to add as a new AFR
array unified with the old). Similarly, note that when more than R AFR
servers are available in the minimum redundancy model, then the AFR
"usable disk size" is no longer limited by the smallest disk in the
array but should approach 1/R * the total space on all AFR servers,
assuming that most of the disk space isn't concentrated on < R servers.
2. All known copies could be kept up-to-date via some sort of
differential algorithm (at least for appends, this would be easy).
Using the current read cache, I think that if a large file gets written
then the entire file must be recopied over the network to any caching
3. If any one AFR host is lost in the minimum redundancy model, 1/R of
its available diskspace would be lost to the array until it recovers,
but any lost copies under the minimum redundancy threshold could
immediately and automatically be mirrored to other servers, restoring
the minimum redundancy even before the downed host came back online and
before any administrators even became aware of the problem, assuming
disk space remained available on the other hosts.
4. When adding a new disk and/or host to the AFR array, nothing need be
immediately copied to it, saving bandwidth. The new disk space can be
used as it becomes convenient.
There may be a few more. I started composing a summary of the thread
and exactly which new features I thought Gluster was going to need to
support it last week. I'll try to finish it in the next week or so and
maybe it will remind me of any other advantages this design may have.
Derek R. Price
Ximbiot, LLC <http://ximbiot.com>
Get CVS and Subversion Support from Ximbiot!
v: +1 248.835.1260
f: +1 248.246.1176
More information about the Gluster-devel