[Gluster-devel] regarding GF_CONTENT_KEY and dht2 - perf with small files

Wed Feb 3 10:12:17 UTC 2016

>> The file data would be located based on its GFID, so before the *first*
>> lookup/stat for a file, there is no way to know it's GFID.
>> NOTE: Instead of a name hash the GFID hash is used, to get immunity
>> against renames and the like, as a name hash could change the location
>> information for the file (among other reasons).
>
> Another manner of achieving the same when the GFID of the file is 
> known (from a readdir) is to wind the lookup and read of size to the 
> respective MDS and DS, where the lookup would be responded to once the 
> MDS responds, and the DS response is cached for the subsequent 
> open+read case. So on the wire we would have a fan out of 2 FOPs, but 
> still satisfy the quick read requirements.

Tar kind of workload doesn't have a problem because we know the gfid 
after readdirp.

>
> I would assume the above resolves the problem posted, are there cases 
> where we do not know the GFID of the file? i.e no readdir performed 
> and client knows the file name that it wants to operate on? Do we have 
> traces of the webserver workload to see if it generates names on the 
> fly or does a readdir prior to that?
>
Problem is with workloads which know the files that need to be read 
without readdir, like hyperlinks (webserver), swift objects etc. These 
are two I know of which will have this problem, which can't be improved 
because we don't have metadata, data co-located. I have been trying to 
think of a solution for past few days. Nothing good is coming up :-/

Pranith