[Gluster-users] Giving up [ was: Re: read-subvolume]
jdarcy at redhat.com
Wed Jul 10 12:58:10 UTC 2013
On 07/10/2013 07:01 AM, Allan Latham wrote:
> I have a simple scenario and it just simply doesn't work. Reading over
> the network when the file is available locally is plainly wrong. Our
> application cannot take the performance hit nor the extra network traffic.
Another victim of our release process. :( Code was added to choose the local
subvolume whenever possible in *June 2012* (commit 0baa65b6). Further fixes
and related changes, including a user-submitted patch to force this choice for
sites with more complex needs, have gone in since then. None of them have made
it into a release yet, since 3.4 is still in beta and the changes have not been
backported into 3.3.anything (including 3.3.1 which I see you were using). All
I can offer is an apology.
> 1. get a simple minimalist configuration working - 2 hosts and
> replication only.
> 2. make it bomb-proof.
> 2a. it must cope with network failures, random reboots etc.
> 2b. if it stops it has to auto-recover quickly.
> 2c. if it can't it needs thorough documentation and adequate logs so a
> reasonable sysop can rescue it.
This is one of my own pet peeves. I will personally be working on the
internals documentation soon, so users will at least have a chance of
understanding what the often-cryptic log messages really mean. Improvements to
logging, event reporting, and so on are also ongoing, albeit slowly and not
under my direct purview.
> 2d. it needs a fast validation scanner which verifies that data is where
> it should be and is identical everywhere (md5sum).
How fast is fast? What would be an acceptable time for such a scan on a volume
containing (let's say) ten million files?
> 3. make it efficient (read local whenever possible - use rsync
> techniques - remove scalability obstacles so it doesn't get
> exponentially slower as more files are replicated)
Can you explain "exponentially"? The time for a full scan should increase
*linearly* with number of files. That's bad enough, and it's why we're
starting to get away from reliance on full scans in favor of logging or
journaling approaches, but if you're seeing exponential behavior then something
> 4. when that works expand to multiple hosts and clever distribution
That would be a fine sentiment for a new project, but it's not really an option
when there are already thousands of users relying on the "clever distribution
techniques" and many other features in production. We do have to fix their
bugs too, so we can't devote all of our resources to improving or
reimplementing replication. Believe me, I wish we could.
Thank you for your constructive feedback. I hope that we can use it to make
things better for everyone.
More information about the Gluster-users