[Gluster-users] dht hashing based on basename only?
jklein at physi.uni-heidelberg.de
Tue Aug 14 14:36:25 UTC 2012
many thanks for the explanation. Then, there seems to be a problem with
the layout updating during rebalancing. For me all directories *do have*
the same mappings for the hash intervals, which creates this problem.
I checked another volume which had several bricks from the beginning, and
there I see different mappings. This volume was with version 3.2.5, but I
just confirmed with 3.3.0 that creating directories on a 3-brick
distributed volume produces different mappings.
So, at the moment a rebalance operation sets the same mapping intervals
for all directories instead of shuffling them. Thus, it cannot achieve a
proper distribution of the files, which I would consider a bug (at least I
hope it's not a feature).
Should I then file a bug report? Or what's the best way to proceed?
PS: Also for me identical mappings with taking into account the path
for the hash would seem favourable. It would also facilitate adjustments
of the intervals, e.g. to accomodate for different brick sizes (which
seemed to complicated but it would be interesting for us).
On Tue, 14 Aug 2012, Jeff Darcy wrote:
> On 08/14/2012 03:44 AM, Jochen Klein wrote:
>> Looking at the implementation in the dht translator and checking
>> calculated hashes it seems that only the basename is used for the hash
>> calculation of a given file. With all directories having the same
>> mappings for the hash intervals to bricks, this would explain our
>> observation if only this file hash is used. However, I also see hashes
>> calculated for directories but it's not clear to me for what they are
>> Do I miss something here? Is this behaviour intended? Is there a
>> (supported) way to still distribute the files homogeneously to all
>> bricks? E.g. by using the full path for the hashing (which is actually
>> what I understood from the manual), or by shuffling the hash intervals
>> per directory?
> I tripped over the same issue a while ago. Yes, the file hashes use only the
> basename. However, it's not (or at least shouldn't be) true that all
> directories have the same mappings for the hash intervals. The same ranges are
> used, but rotated into different orders. So, using letters for hash values,
> different directories might have:
> A-I on brick1, J-R on brick2, S-Z on brick3
> A-I on brick2, J-R on brick3, S-Z on brick1
> A-I on brick3, J-R on brick1, S-Z on brick2
> I just ran a quick test creating a bunch of directories on a simple two-brick
> distributed volume. Sure enough, about half of the directories got one order,
> and the other half got another. If this isn't working the same way for you
> (check using "getxattr -x -n trusted.glusterfs.dht" on each per-brick copy of
> each directory) then it's probably a bug and we'll have to figure out why.
> Personally, I'd prefer if all directories *did* have the same hash layout, so
> that those layouts could be inherited instead of having to be set separately on
> each and every directory of a potentially-petabyte volume. That would require
> that the hash include some directory-specific value (such as the parent GFID)
> as well as the basename, but that seems a small price to pay. In other words,
> though right now it's a bit non-obvious how the layouts and hashing work, some
> day they might work as you (and I) had expected.
> Gluster-users mailing list
> Gluster-users at gluster.org
More information about the Gluster-users