[Gluster-devel] regarding GF_CONTENT_KEY and dht2 - perf with small files
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Feb 3 10:12:17 UTC 2016
>> The file data would be located based on its GFID, so before the *first*
>> lookup/stat for a file, there is no way to know it's GFID.
>> NOTE: Instead of a name hash the GFID hash is used, to get immunity
>> against renames and the like, as a name hash could change the location
>> information for the file (among other reasons).
> Another manner of achieving the same when the GFID of the file is
> known (from a readdir) is to wind the lookup and read of size to the
> respective MDS and DS, where the lookup would be responded to once the
> MDS responds, and the DS response is cached for the subsequent
> open+read case. So on the wire we would have a fan out of 2 FOPs, but
> still satisfy the quick read requirements.
Tar kind of workload doesn't have a problem because we know the gfid
> I would assume the above resolves the problem posted, are there cases
> where we do not know the GFID of the file? i.e no readdir performed
> and client knows the file name that it wants to operate on? Do we have
> traces of the webserver workload to see if it generates names on the
> fly or does a readdir prior to that?
Problem is with workloads which know the files that need to be read
without readdir, like hyperlinks (webserver), swift objects etc. These
are two I know of which will have this problem, which can't be improved
because we don't have metadata, data co-located. I have been trying to
think of a solution for past few days. Nothing good is coming up :-/
More information about the Gluster-devel