[Gluster-devel] Re; Load balancing ...

Martin Fick mogulguy at yahoo.com
Fri Apr 25 15:44:36 UTC 2008


--- gordan at bobich.net wrote:
> On Fri, 25 Apr 2008, Gareth Bult wrote:
...
> > Moving to the other end of the scale, AFR can't
> > cope with large files either .. handling of sparse

> > files doesn't work properly and self-heal 
> > has no concept of repairing part of a file .. so
> > sticking a 20Gb file on a GlusterFS is just asking

> > for trouble as every time you restart a 
> > gluster server (or every time one crashes) it'll
> > crucify your network.
> 
> I thought about this, and there isn't really a way
> to do anything about this, unless you relax the 
> constraints. You could to a rsync-type rolling 
> checksum block-sync, but this would both take up
> more CPU time and result in theoretical scope for
the
> file to not be the same on both ends. Whether 
> this minute possibility of corruption that the
> hashing algorithm doedn't pick up is a reasonable 
> trade-off, I don't know. Perhaps if such a thing 
> were implemented it should be made optional.

Hmm, I'm confused, surely in both these cases there is
not much more network overhead than when the 20G file
was supposed to have been written in the first place
is there?  If you have two nodes and the 20 GB file
only got written to node A while node B was down and
node B comes up the whole 20 GB is resynced to node B;
is that more network usage than if the 20 GB file were
written immediately to node A & node B.

Perhaps the issue is really that the cost comes at an
unexpected time, on node startup instead of when the
file was originally written?  Would a startup
throttling mechanism help here on resyncs?


-Martin



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ





More information about the Gluster-devel mailing list