[Gluster-users] Giving up [ was: Re: read-subvolume]

Wed Jul 10 13:20:58 UTC 2013

Hi Jeff

Thanks for the reply and all the great work you are doing. I know how
hard it is - believe me.

Where do I get a version that will solve my 'read local if we have the
file here' problem.

My use case is exactly two servers at a server farm with 100Mbit between
them. This 100Mbit is also shared with the outside internet. Hence the
need to minimise use of this very limited resource.

Writes are rare and we just have to live with that load on the network.
Reads are very common and I need to keep these off the network.
Read/Write ratio is probably 10000:1 or more.

Doing an md5sum on a local 500Mb file takes 500ms (probably cached - I
would have expected 5 seconds or so for real reads). On gluster it takes
500ms or 18 seconds (and in that case it's hogged the network for 18
seconds).

I'm willing to give a new version a try.

We are still in evaluation. Current 'best' is what we are familiar with
= unison and inotify. I don't like it because it's really only a hack.
However it works. If inotify misses a change due to race conditions
unison gets run every five minutes anyway.

It's an example of a failure mode which really does self heal.

PS my own use case is unlikely to hit exponential delays. We are talking
about a few Gb and a few tens of thousands of files. I was hoping to
help with your roadmap. My preferred method is always 'get the simple
case working properly first' then optimise.

All the best and thanks again for all your efforts

Allan

On 10/07/13 14:58, Jeff Darcy wrote:
> On 07/10/2013 07:01 AM, Allan Latham wrote:
>> I have a simple scenario and it just simply doesn't work. Reading over
>> the network when the file is available locally is plainly wrong. Our
>> application cannot take the performance hit nor the extra network
>> traffic.
> 
> Another victim of our release process.  :(  Code was added to choose the
> local subvolume whenever possible in *June 2012* (commit 0baa65b6). 
> Further fixes and related changes, including a user-submitted patch to
> force this choice for sites with more complex needs, have gone in since
> then.  None of them have made it into a release yet, since 3.4 is still
> in beta and the changes have not been backported into 3.3.anything
> (including 3.3.1 which I see you were using).  All I can offer is an
> apology.
> 
>> 1. get a simple minimalist configuration working - 2 hosts and
>> replication only.
>> 2. make it bomb-proof.
>> 2a. it must cope with network failures, random reboots etc.
>> 2b. if it stops it has to auto-recover quickly.
>> 2c. if it can't it needs thorough documentation and adequate logs so a
>> reasonable sysop can rescue it.
> 
> This is one of my own pet peeves.  I will personally be working on the
> internals documentation soon, so users will at least have a chance of
> understanding what the often-cryptic log messages really mean. 
> Improvements to logging, event reporting, and so on are also ongoing,
> albeit slowly and not under my direct purview.
> 
>> 2d. it needs a fast validation scanner which verifies that data is where
>> it should be and is identical everywhere (md5sum).
> 
> How fast is fast?  What would be an acceptable time for such a scan on a
> volume containing (let's say) ten million files?
> 
>> 3. make it efficient (read local whenever possible - use rsync
>> techniques - remove scalability obstacles so it doesn't get
>> exponentially slower as more files are replicated)
> 
> Can you explain "exponentially"?  The time for a full scan should
> increase *linearly* with number of files.  That's bad enough, and it's
> why we're starting to get away from reliance on full scans in favor of
> logging or journaling approaches, but if you're seeing exponential
> behavior then something is amiss.
> 
>> 4. when that works expand to multiple hosts and clever distribution
>> techniques.
> 
> That would be a fine sentiment for a new project, but it's not really an
> option when there are already thousands of users relying on the "clever
> distribution techniques" and many other features in production.  We do
> have to fix their bugs too, so we can't devote all of our resources to
> improving or reimplementing replication.  Believe me, I wish we could.
> 
> Thank you for your constructive feedback.  I hope that we can use it to
> make things better for everyone.
>