[Gluster-devel] Re; Load balancing ...

Wed Apr 30 15:58:46 UTC 2008

We have a work around for this so that clients don't see a "blocked" 
situation. we have a 6 node cluster and whenever a server node comes up 
it lists all the files on it's "slice" of the gluster mount and then 
does a find on those files on the main mounted volume. In other words 
each server is also a client, and does the heal process right after 
starting up.

-Mic

Gareth Bult wrote:
> Ok, I'm afraid I'm not agreeing with some of the things you're saying, but that doesn't really get us anywhere.
>
> I think the bottom line is, having read or write operations block for any length of time in order to merge a node back into the cluster following an outage simply isn't "real world". "any length of time" will vary for different operators, for me it's seconds, for anyone running real-time systems or VoIP for example, it might be milliseconds.
>
> My feeling is that self-heal "on open" is "nice", easy to code and maintain, neat, and I can see "how it came to be".
>
> It's now however practical for a commercial solution, healing needs to be a background process, not a foreground process.
>
> The only question I really have is, what sort of timescale are we looking at before GlusterFS has the kind of capabilities (in context) that one would expect of a production clustering filesystem ? 
> (I think we all know hash/sync would be an interim solution)
>
> (I know it's "different", but just as a point of reference, DRBD syncs in the background, as does Linux software raid .. can you imagine anyone using either if they tried to make people wait while they did a foreground sync ??)
>
> Gareth.
>
>
> ----- Original Message -----
> From: gordan at bobich.net
> To: gluster-devel at nongnu.org
> Sent: Wednesday, April 30, 2008 1:56:12 PM GMT +00:00 GMT Britain, Ireland, Portugal
> Subject: Re: [Gluster-devel] Re; Load balancing ...
>
>
>
> On Wed, 30 Apr 2008, Gareth Bult wrote:
>
>   
>>> It would certainly ber beneficial in the cases when the network speed 
>>> is slow (e.g. WAN replication).
>>>       
>> So long as it's server side AFR and not client-side ... ?
>>     
>
> Sure.
>
>   
>> I'm guessing there would need to be some server side logic to ensure 
>> that local servers generated their own hashes and only exchanged the 
>> hashes over the network rather than the data ?
>>     
>
> Indeed - same as rsync does.
>
>   
>>> Journal per se wouldn't work, because that implies fixed size and write-ahead logging.
>>> What would be required here is more like the snapshot style undo logging.
>>>       
>> A journal wouldn't work ?!
>> You mean it's effectiveness would be governed by it's size?
>>     
>
> Among other things. A "journal" just isn't suitable for this sort of 
> thing.
>
>   
>>> 1) Categorically establish whether each server is connected and up to date
>>> for the file being checked, and only log if the server has disconnected.
>>> This involves overhead.
>>>       
>> Surely you would log anyway, as there could easily be latency between an 
>> actual "down" and one's ability to detect it .. in which case detecting 
>> whether a server has disconnected it a moot point.
>>     
>
> Not really. A connected client/server will have a live/working TCP 
> connection open. Read-locks don't matter as they can be served locally, 
> but when a write occurs, the file gets locked. If a remote machine doesn't 
> ack the lock, and/or it's TCP connection resets, then it's safe to assume 
> that it's not connected.
>
>   
>> In terms of the 
>> overhead of logging, I guess this would be a decision for the sysadmin 
>> concerned, whether the overhead of logging to a journal was worthwhile 
>> .vs. the potential issues involved in recovering from an outage?
>>     
>
> That complicates things further, then. You'd essentially have asynchronous 
> logging/replication. At that point you pretty much have to log all writes 
> all the time. That means potentially huge space and speed overheads.
>
>   
>> From my point of view, if journaling halved my write performance (which 
>> it wouldn't) I wouldn't even have to think about it.
>>     
>
> Actually, saving an undo-log a-la snapshots, which is what would be 
> required, _WOULD_ halve your write performance on all surviving servers if 
> one server was out. If multiple servers were out, you could probably work 
> around some of this with merging/splitting the undo logs for various 
> machines, so your write performance would generally be around 1/2 of 
> standard, but wouldn't end up degrading to 1/n+1 where n is the number of 
> failed servers for which the logging needs to be done.
>
>   
>>> The problem that arises then is that the fast(er) resyncs on small changes
>>> come at the cost of massive slowdown in operation when you have multiple
>>> downed servers. As the number of servers grows, this rapidly stops being a
>>> workable solution.
>>>       
>> Ok, I don't know about anyone else, but my setups all rely on 
>> consistency rather than peaks and troughs. I'd far rather run a journal 
>> at half potential speed, and have everything run at that speed all the 
>> time .. than occasionally have to stop the entire setup while the system 
>> recovers, or essentially wait for 5-10 minutes while the system re-syncs 
>> after a node is reloaded.
>>     
>
> There may be a way to address the issue of halting the rest of the cluster 
> during the sync, though. Read lock on a syncing file shouldn't stop other 
> read locks. Of course, it will block writes while the file syncs and the 
> reading app finishes the operation.
>
> Gordan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>   

--