[Gluster-devel] regarding GF_CONTENT_KEY and dht2 - perf with small files
Shyam
srangana at redhat.com
Wed Feb 3 14:46:53 UTC 2016
On 02/03/2016 07:54 PM, Jeff Darcy wrote:
>> Problem is with workloads which know the files that need to be read
>> without readdir, like hyperlinks (webserver), swift objects etc. These
>> are two I know of which will have this problem, which can't be improved
>> because we don't have metadata, data co-located. I have been trying to
>> think of a solution for past few days. Nothing good is coming up :-/
>
> In those cases, caching (at the MDS) would certainly help a lot. Some
> variation of the compounding infrastructure under development for Samba
> etc. might also apply, since this really is a compound operation.
>
The above is certainly an option, need to process it a bit more to
respond sanely.
Another one is to generate the GFID for a file with parGFID+basename as
input (which was something Pranith brought a few mails back in this
chain). There was concern that we will have GFID clashes, but further
reasoning suggests that it would not. An example follows,
Good cases:
- /D1/File is created, with top 2 bytes of the files GFID as the bucket
(same as D1 bucket), and rest of GFID as some UUID generation of pGFID
(gfid of D1) + base name
- When this file is looked up by name, its GFID can be generated at the
client side as a hint, and the same fan out of lookup to MDS and read to
DS can be initiated
* Validity of the READ data, is good only when the lookup agrees on the
same GFID for the file
Bad cases:
- On a rename, the GFID of the file does not change, and so if /D1/File
was renamed to /D2/File1, then a subsequent lookup could fail to
prefetch the read, as the GFID hint generated is now based on GFID of D2
and new name File1
- If post a rename /D1/File is again created, the GFID
generated/requested by the client for this file would clash with the
already generated GFID, hence the DHT server would decide to return a
new GFID, that has no relation to the one generated by the hint. Again
resulting in the nint failing
So with the above scheme, as long as files are not renamed the hint
serves its purpose to prefetch even with just the name and parGFID.
One gotcha is that, I see a pattern with applications, that create a tmp
file and then renames it to the real file name, sort of a swap file and
then rename it to the real file as needed. For all such applications the
hints above would fail.
I believe even Swift also uses a similar trick on the FS to rename an
object, once it is considered fully written to. Another case would be
compile workload. So overall the above as a scheme could work to
alleviate the problem somewhat, but may cause harm in others (where the
GFID hint is incorrect and so we end up sending a read without reason).
The above could easily be prototyped with DHT2 to see its benefits, so
we will try that out at some point in the future.
Shyam
More information about the Gluster-devel
mailing list