[Gluster-devel] regarding GF_CONTENT_KEY and dht2 - perf with small files

Thu Feb 4 06:04:04 UTC 2016

On 02/04/2016 09:38 AM, Vijay Bellur wrote:
> On 02/03/2016 11:34 AM, Venky Shankar wrote:
>> On Wed, Feb 03, 2016 at 09:24:06AM -0500, Jeff Darcy wrote:
>>>> Problem is with workloads which know the files that need to be read
>>>> without readdir, like hyperlinks (webserver), swift objects etc. These
>>>> are two I know of which will have this problem, which can't be improved
>>>> because we don't have metadata, data co-located. I have been trying to
>>>> think of a solution for past few days. Nothing good is coming up :-/
>>>
>>> In those cases, caching (at the MDS) would certainly help a lot.  Some
>>> variation of the compounding infrastructure under development for Samba
>>> etc. might also apply, since this really is a compound operation.

Compounding in this case can help, but still without the cache, the read 
has to go to the DS, and on such a compounding, the MDS would reach out 
to the DS for the information than the client. Another possibility based 
on what we decide as the cache mechanism.

>>
>> When a client is done modifying a file, MDS would refresh it's size,
>> mtime
>> attributes by fetching it from the DS. As part of this refresh, DS could
>> additionally send back the content if the file size falls in range, with
>> MDS persisting it, sending it back for subsequent lookup calls as it does
>> now. The content (on MDS) can be zapped once the file size crosses the
>> defined limit.

Venky, when you say persisting, I assume on disk, is that right?

If so, then the MDS storage size requirements would increase (based on 
amount of file data that need to be stored). As of now it is only 
inodes, and as we move to a db a record. In this case we may have 
*fatter* MDS partitions. Any comments/thoughts on that?

As with memory I would assume some form of eviction of data from MDS, to 
control the space utilization here as a possibility.

>>
>
> I like the idea. However the memory implications of maintaining content
> in MDS is something to watch out for. quick-read is interested in files
> of size 64k by default and with a reasonable number of files in that
> range, we might end up consuming significant memory with this scheme.

Vijay, I think what Venky states is to stash the file on the local 
storage and not in memory. If it was in memory then brick process 
restarts would nuke the cache, and either we need mechanisms to 
rebuild/warm the cache or just start caching afresh.

If we were caching in memory, then yes the concern is valid, and one 
possibility is  some form of LRU for the same, to keep memory 
consumption in check.

Overall I would steer away from memory for this use case, and use the 
disk, as we do not know which files to cache (well in either case, but 
disk offers us more space to possibly punt on that issue). For files 
where the cache is missing and the file is small enough, either perform 
async read from the client (gaining some overlap time with the app) or 
just let it be, as we would get the open/read anyway, but would slow 
things down.

>
> -Vijay
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel