[Gluster-users] rebalance and volume commit hash

Jeff Darcy jdarcy at redhat.com
Tue Jan 17 13:28:03 UTC 2017


> Can you tell me please why every volume rebalance generates a new value
> for the volume commit hash?
> 
> If I have fully rebalanced cluster (or almost) with millions of
> directories then rebalance has to change DHT xattr for every directory
> only because there is a new volume commit hash value. It is pointless in
> my opinion. Is there any reason behind this? As I observed, the volume
> commit hash is set at the rebalance beginning which totally destroys
> benefit of lookup optimization algorithm for directories not
> scanned/fixed yet by this rebalance run.

It disables the optimization because the optimization would no longer
lead to correct results.  There are plenty of distributed filesystems
that seem to have "fast but wrong" as a primary design goal; we're
not one of them.

The best way to think of the volume-commit-hash update is as a kind of
cache invalidation.  Lookup optimization is only valid as long as we
know that the actual distribution of files within a directory is
consistent with the current volume topology.  That ceases to be the
case as soon as we add or remove a brick, leaving us with three choices.

(1) Don't do lookup optimization at all.  *Every* time we fail to find
a file on the brick where hashing says it should be, look *everywhere*
else.  That's how things used to work, and still work if lookup
optimization is disabled.  The drawback is that every add/remove brick
operation causes a permanent and irreversible degradation of lookup
performance.  Even on a freshly created volume, lookups for files that
don't exist anywhere will cause every brick to be queried.

(2) Mark every directory as "unoptimized" at the very beginning of
rebalance.  Besides being almost as slow as fix-layout itself, this
would require blocking all lookups and other directory operations
*anywhere in the volume* while it completes.

(3) Change the volume commit hash, effectively marking every
directory as unoptimized without actually having to touch every one.
The root-directory operation is cheap and almost instantaneous.
Checking each directory commit hash isn't free, but it's still a
lot better than (1) above.  With upcalls we can enhance this even
further.

Now that you know a bit more about the tradeoffs, do "pointless"
and "destroys the benefit" still seem accurate?



More information about the Gluster-users mailing list