[Gluster-users] rebalance and volume commit hash
pmisiak at cloudferro.com
Tue Jan 17 13:42:31 UTC 2017
17 sty 2017 14:28 Jeff Darcy <jdarcy at redhat.com> napisał(a):
> > Can you tell me please why every volume rebalance generates a new value
> > for the volume commit hash?
> > If I have fully rebalanced cluster (or almost) with millions of
> > directories then rebalance has to change DHT xattr for every directory
> > only because there is a new volume commit hash value. It is pointless in
> > my opinion. Is there any reason behind this? As I observed, the volume
> > commit hash is set at the rebalance beginning which totally destroys
> > benefit of lookup optimization algorithm for directories not
> > scanned/fixed yet by this rebalance run.
> It disables the optimization because the optimization would no longer
> lead to correct results. There are plenty of distributed filesystems
> that seem to have "fast but wrong" as a primary design goal; we're
> not one of them.
> The best way to think of the volume-commit-hash update is as a kind of
> cache invalidation. Lookup optimization is only valid as long as we
> know that the actual distribution of files within a directory is
> consistent with the current volume topology. That ceases to be the
> case as soon as we add or remove a brick, leaving us with three choices.
> (1) Don't do lookup optimization at all. *Every* time we fail to find
> a file on the brick where hashing says it should be, look *everywhere*
> else. That's how things used to work, and still work if lookup
> optimization is disabled. The drawback is that every add/remove brick
> operation causes a permanent and irreversible degradation of lookup
> performance. Even on a freshly created volume, lookups for files that
> don't exist anywhere will cause every brick to be queried.
> (2) Mark every directory as "unoptimized" at the very beginning of
> rebalance. Besides being almost as slow as fix-layout itself, this
> would require blocking all lookups and other directory operations
> *anywhere in the volume* while it completes.
> (3) Change the volume commit hash, effectively marking every
> directory as unoptimized without actually having to touch every one.
> The root-directory operation is cheap and almost instantaneous.
> Checking each directory commit hash isn't free, but it's still a
> lot better than (1) above. With upcalls we can enhance this even
> Now that you know a bit more about the tradeoffs, do "pointless"
> and "destroys the benefit" still seem accurate?
Thank you Jeff for your response. I understand this optimisation clearly but I don't understand why new commit hash is generated for the volume during rebalance process? I think it should be generated only during add/remove brick events but not during rebalance.
More information about the Gluster-users