[Gluster-devel] EHT / DHT

Poornima Gurusiddaiah pgurusid at redhat.com
Wed Nov 26 05:16:07 UTC 2014


Out of curiosity, what back end and deduplication solution are you using? 

Regards, 
Poornima 

----- Original Message -----

From: "Jan H Holtzhausen" <janh at holtztech.info> 
To: "Anand Avati" <avati at gluster.org>, "Shyam" <srangana at redhat.com>, gluster-devel at gluster.org 
Sent: Wednesday, November 26, 2014 3:43:36 AM 
Subject: Re: [Gluster-devel] EHT / DHT 

Yes we have deduplication at the filesystem layer 

BR 
Jan 

From: Anand Avati < avati at gluster.org > 
Date: Wednesday 26 November 2014 at 12:11 AM 
To: Jan H Holtzhausen < janh at holtztech.info >, Shyam < srangana at redhat.com >, < gluster-devel at gluster.org > 
Subject: Re: [Gluster-devel] EHT / DHT 

Unless there is some sort of de-duplication under the covers happening in the brick, or the files are hardlinks to each other, there is no cache benefit whatsoever by having identical files placed on the same server. 

Thanks, 
Avati 

On Tue Nov 25 2014 at 12:59:25 PM Jan H Holtzhausen < janh at holtztech.info > wrote: 


As to the why. 
Filesystem cache hits. 
Files with the same name tend to be the same files. 

Regards 
Jan 




On 2014/11/25, 8:42 PM, "Jan H Holtzhausen" < janh at holtztech.info > wrote: 

>So in a distributed cluster, the GFID tells all bricks what a files 
>preceding directory structure looks like? 
>Where the physical file is saved is a function of the filename ONLY. 
>Therefore My requirement should be met by default, or am I being dense? 
> 
>BR 
>Jan 
> 
> 
> 
>On 2014/11/25, 8:15 PM, "Shyam" < srangana at redhat.com > wrote: 
> 
>>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote: 
>>> STILL doesn’t work … exact same file ends up on 2 different bricks … 
>>> I must be missing something. 
>>> All I need is for: 
>>> /directory1/subdirectory2/foo 
>>> And 
>>> /directory2/ subdirectoryaaa999/foo 
>>> 
>>> 
>>> To end up on the same brick…. 
>> 
>>This is not possible is what I was attempting to state in the previous 
>>mail. The regex filter is not for this purpose. 
>> 
>>The hash is always based on the name of the file, but the location is 
>>based on the distribution/layout of the directory, which is different 
>>for each directory based on its GFID. 
>> 
>>So there are no options in the code to enable what you seek at present. 
>> 
>>Why is this needed? 
>> 
>>Shyam 
> 
>_____________________________ __________________ 
>Gluster-devel mailing list 
> Gluster-devel at gluster.org 
> http://supercolony.gluster. org/mailman/listinfo/gluster- devel 

______________________________ _________________ 
Gluster-devel mailing list 
Gluster-devel at gluster.org 
http://supercolony.gluster. org/mailman/listinfo/gluster- devel 




_______________________________________________ 
Gluster-devel mailing list 
Gluster-devel at gluster.org 
http://supercolony.gluster.org/mailman/listinfo/gluster-devel 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141126/5d818c66/attachment.html>


More information about the Gluster-devel mailing list