[Gluster-devel] Re; Load balancing ...

Wed Apr 30 11:52:55 UTC 2008

On Wed, 30 Apr 2008, Gareth Bult wrote:

> Sorry, I'm trying to follow this but I'm coming a little unstuck ..
>
> Am I right in thinking the rolling hash / rsync solution would involve 
> syncing the file "on open" as per the current system .. and in order to 
> do this, the server would have to read through the entire file in order 
> to create the hashes?
> (indeed it would need to do this on two servers to create hashes for comparison?)

Yes.

> So .. as a rough benchmark .. assume 50Mb/sec for a standard / modern 
> SATA drive, opening a crashed 20G file is going to take 400 seconds or 
> six minutes ... ? (which would also flatten two servers for the 
> duration)

It would certainly ber beneficial in the cases when the network speed is 
slow (e.g. WAN replication).

> Whereas a journal replay of 10M is going to take < 1s and be effectively transparent.
> (I'm guessing this could also be done at open time ??)

Journal per se wouldn't work, because that implies fixed size and 
write-ahead logging. What would be required here is more like the 
snapshot style undo logging.

The problem with this is that you have to:

1) Categorically establish whether each server is connected and up to date 
for the file being checked, and only log if the server has disconnected. 
This involves overhead.

2) For each server that is down at the time, each other server would have 
to start writing the snapshot style undo logs (which would have to be 
per server) for all the files being changed. This effectively multiplies 
the disk write-traffic by the number of offline servers on all the working 
up to date servers.

The problem that arises then is that the fast(er) resyncs on small changes 
come at the cost of massive slowdown in operation when you have multiple 
downed servers. As the number of servers grows, this rapidly stops being a 
workable solution.

Gordan