[Gluster-devel] regarding GF_CONTENT_KEY and dht2 - perf with small files

Wed Feb 3 14:46:53 UTC 2016

On 02/03/2016 07:54 PM, Jeff Darcy wrote:
>> Problem is with workloads which know the files that need to be read
>> without readdir, like hyperlinks (webserver), swift objects etc. These
>> are two I know of which will have this problem, which can't be improved
>> because we don't have metadata, data co-located. I have been trying to
>> think of a solution for past few days. Nothing good is coming up :-/
>
> In those cases, caching (at the MDS) would certainly help a lot.  Some
> variation of the compounding infrastructure under development for Samba
> etc. might also apply, since this really is a compound operation.
>

The above is certainly an option, need to process it a bit more to 
respond sanely.

Another one is to generate the GFID for a file with parGFID+basename as 
input (which was something Pranith brought a few mails back in this 
chain). There was concern that we will have GFID clashes, but further 
reasoning suggests that it would not. An example follows,

Good cases:
- /D1/File is created, with top 2 bytes of the files GFID as the bucket 
(same as D1 bucket), and rest of GFID as some UUID generation of pGFID 
(gfid of D1) + base name
- When this file is looked up by name, its GFID can be generated at the 
client side as a hint, and the same fan out of lookup to MDS and read to 
DS can be initiated
* Validity of the READ data, is good only when the lookup agrees on the 
same GFID for the file

Bad cases:
- On a rename, the GFID of the file does not change, and so if /D1/File 
was renamed to /D2/File1, then a subsequent lookup could fail to 
prefetch the read, as the GFID hint generated is now based on GFID of D2 
and new name File1
- If post a rename /D1/File is again created, the GFID 
generated/requested by the client for this file would clash with the 
already generated GFID, hence the DHT server would decide to return a 
new GFID, that has no relation to the one generated by the hint. Again 
resulting in the nint failing

So with the above scheme, as long as files are not renamed the hint 
serves its purpose to prefetch even with just the name and parGFID.

One gotcha is that, I see a pattern with applications, that create a tmp 
file and then renames it to the real file name, sort of a swap file and 
then rename it to the real file as needed. For all such applications the 
hints above would fail.

I believe even Swift also uses a similar trick on the FS to rename an 
object, once it is considered fully written to. Another case would be 
compile workload. So overall the above as a scheme could work to 
alleviate the problem somewhat, but may cause harm in others (where the 
GFID hint is incorrect and so we end up sending a read without reason).

The above could easily be prototyped with DHT2 to see its benefits, so 
we will try that out at some point in the future.

Shyam