[Gluster-users] Regarding the issues gluster DHT and Layouts of bricks

Thu May 21 12:47:26 UTC 2015

Commets inline.

----- Original Message -----
> From: "Subrata Ghosh" <subrata.ghosh at ericsson.com>
> To: gluster-devel at gluster.org, gluster-users at gluster.org
> Cc: "Nobin Mathew" <nobin.mathew at ericsson.com>, "Susant Palai" <spalai at redhat.com>, "Vijay Bellur"
> <vbellur at redhat.com>
> Sent: Thursday, 21 May, 2015 4:26:05 PM
> Subject: Regarding the issues gluster DHT and  Layouts of bricks
> 
> 
> Hi  All,
> 
> Could you please guide us  to solve the following DHT and brick layout
> problem we are  dealing with ? Questions are marked bold.
> 
> Problem statement :
> 
> 
> 1.      We have a requirement to achieve maximum write and read performance
> and we have to meet some committed performance metrics.
> 
>                Our goal is to place each file into different bricks to get
>                optimal performance and also observer the nature of the
>                throughput , hence need to have a mechanism  to generate
>                different hash using gluster glusterfs.gf_dm_hashfn,
> (assuming number of files are : N , Number of Bricks :N)  to place spate
> bricks.
> 
> 
> -        How to make sure each file has different hash and   falls to
> different bricks ?
> 
> 
> 
> -        Other way to put the question if I  know the range of the brick
> layout or more precisely if I know the  hex value of the desired hash ( so
> that it will be placed desired brick)  that we need to generate from
> Davis-Meyer algorithm used in gluster,  Can we create a file name such that,
> that also solve our problem to some extent?
> 
> 
> 2.      We tried to experiment to see  how a file in gluster is decided to be
> placed in a particular brick following gluster glusterfs.gf_dm_hashfn and
> took some idea from
>        some articles  like
>        http://gluster.readthedocs.org/en/latest/Features/dht/ ,
>        https://joejulian.name/blog/dht-misses-are-expensive/ page which
>        describes layout for that brick  and calculate a hash for the file.
> 
> 
>         To minimize collisions or generating different hash in such way to
>         place each file in different bricks ( file 1 => brick A, file 2 =>
>         Brick B, file 3=>  Brick C, file 4 => brick D)
> 
>                We use kind of similar script to get the hash value for a file
> 
> def gf_dm_hashfn(filename):
>     return ctypes.c_uint32(glusterfs.gf_dm_hashfn(
>         filename,
>         len(filendame)))
> 
> if __name__ == "__main__":
>     print hex(gf_dm_hashfn(sys.argv[1]).value)
> 
> We can then calculate the hash for a filename:
> # python gf_dm_hash.py file1
> 0x99d1b6fL
> 
> 
> Extended attribute is fetch to check the range and try to match the above
> generated hash value.
> 
> getfattr -n trusted.glusterfs.dht -e hex file1
> 
> 
>       However we are not able to exactly follow till this point ,  how the
>       hash value matched to one of the layout assignments, to yield what we
>       call a hashed location.
> 
> 
> -        My question is if I  know the range of brick lay out ( say
> 0xc0000000 to  0xffffffff, is range  select a hash 0xc0070000 ) where to be
> placed the next file can we generate the name ( kind of reverse of  gluster
> glusterfs.gf_dm_hashfn) ?

I am not aware of any such mechanism.  You will have to generate file names manually and run them through your script to check whether it falls in the brick range.

> 
> PS :  Susant : Can you throw some light or suggest  a method we are trying to
> solve.
> 
> Thanks for your time.
> 
> 
> Best Regards,
> Subrata Ghosh
> 
> 
> 
> 
> 
> 
> 
Regards,
Susant