[Gluster-users] rebalance and volume commit hash

Piotr Misiak pmisiak at cloudferro.com
Tue Jan 17 13:42:31 UTC 2017

17 sty 2017 14:28 Jeff Darcy <jdarcy at redhat.com> napisał(a):
> > Can you tell me please why every volume rebalance generates a new value 
> > for the volume commit hash? 
> > 
> > If I have fully rebalanced cluster (or almost) with millions of 
> > directories then rebalance has to change DHT xattr for every directory 
> > only because there is a new volume commit hash value. It is pointless in 
> > my opinion. Is there any reason behind this? As I observed, the volume 
> > commit hash is set at the rebalance beginning which totally destroys 
> > benefit of lookup optimization algorithm for directories not 
> > scanned/fixed yet by this rebalance run. 
> It disables the optimization because the optimization would no longer 
> lead to correct results.  There are plenty of distributed filesystems 
> that seem to have "fast but wrong" as a primary design goal; we're 
> not one of them. 
> The best way to think of the volume-commit-hash update is as a kind of 
> cache invalidation.  Lookup optimization is only valid as long as we 
> know that the actual distribution of files within a directory is 
> consistent with the current volume topology.  That ceases to be the 
> case as soon as we add or remove a brick, leaving us with three choices. 
> (1) Don't do lookup optimization at all.  *Every* time we fail to find 
> a file on the brick where hashing says it should be, look *everywhere* 
> else.  That's how things used to work, and still work if lookup 
> optimization is disabled.  The drawback is that every add/remove brick 
> operation causes a permanent and irreversible degradation of lookup 
> performance.  Even on a freshly created volume, lookups for files that 
> don't exist anywhere will cause every brick to be queried. 
> (2) Mark every directory as "unoptimized" at the very beginning of 
> rebalance.  Besides being almost as slow as fix-layout itself, this 
> would require blocking all lookups and other directory operations 
> *anywhere in the volume* while it completes. 
> (3) Change the volume commit hash, effectively marking every 
> directory as unoptimized without actually having to touch every one. 
> The root-directory operation is cheap and almost instantaneous. 
> Checking each directory commit hash isn't free, but it's still a 
> lot better than (1) above.  With upcalls we can enhance this even 
> further. 
> Now that you know a bit more about the tradeoffs, do "pointless" 
> and "destroys the benefit" still seem accurate? 

Thank you Jeff for your response. I understand this optimisation clearly but I don't understand why  new commit hash is generated for the volume during rebalance process? I think it should be generated only during add/remove brick events but not during rebalance.


More information about the Gluster-users mailing list