[Gluster-devel] Faster hashing for DHT

Anand Avati avati at gluster.com
Thu Jan 7 04:00:07 UTC 2010


On Tue, Jan 5, 2010 at 10:42 PM, Jeff Darcy <jdarcy at redhat.com> wrote:
> While looking at the DHT code, I noticed that it's using a 10-round
> Davies-Meyer construction to generate the hashes used for file
> placement.  A little surprised, by this, I ran it by a couple of friends
> who are experts in both cryptography and distributed data storage.  The
> consensus seems to be that the hash used for this purpose needs to be
> collision resistant but not cryptographically strong.  One theorized
> that the choice made in DHT is probably based on prior examples (e.g.
> Freenet and Mojo Nation) where cryptographically strong hashes were
> chosen, but that the requirements driving those decisions probably don't
> apply to GlusterFS.  This is a non-trivial issue because these hashes
> are used quite frequently and the current one is quite computationally
> expensive.  I note that Hsieh's SuperFastHash is already implemented in
> GlusterFS and is used for other purposes.  It's about 3x as fast as the
> DM hash, and has better collision resistance as well.  MurmurHash
> (http://murmurhash.googlepages.com/) is even faster and more collision
> resistant.  For future releases, I suggest dropping the DM hash and
> switching to one of these others.
>

Before we chose the DM hash we did benchmark many hashing algorithms
both for performance and distribution. Among the lot DM gave the best
hash distribution. Computational time of any of these hashing
algorithms is too insignificant to cause any measurable performance
gain. Even if we bring in a new algorithm which is 10x or even 100x
faster, the overall gain is still a very tiny negligible fraction.
Besides the computational savings, it is worth noting that the hash is
computed just once per inode and the location is cached for all
further access.

Avati





More information about the Gluster-devel mailing list