[Gluster-devel] md-cache improvements

Tue Aug 16 02:39:40 UTC 2016

Hi Poornima, Dan -

Let us have a hangout/bluejeans session this week to discuss the planned 
md-cache improvements, proposed timelines and sort out open questions if 
any.

Would 11:00 UTC on Wednesday work for everyone in the To: list?

Thanks,
Vijay

On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote:
>
> My comments inline.
>
> Regards,
> Poornima
>
> ----- Original Message -----
>> From: "Dan Lambright" <dlambrig at redhat.com>
>> To: "Gluster Devel" <gluster-devel at gluster.org>
>> Sent: Wednesday, August 10, 2016 10:35:58 PM
>> Subject: [Gluster-devel] md-cache improvements
>>
>>
>> There have been recurring discussions within the gluster community to build
>> on existing support for md-cache and upcalls to help performance for small
>> file workloads. In certain cases, "lookup amplification" dominates data
>> transfers, i.e. the cumulative round trip times of multiple LOOKUPs from the
>> client mitigates benefits from faster backend storage.
>>
>> To tackle this problem, one suggestion is to more aggressively utilize
>> md-cache to cache inodes on the client than is currently done. The inodes
>> would be cached until they are invalidated by the server.
>>
>> Several gluster development engineers within the DHT, NFS, and Samba teams
>> have been involved with related efforts, which have been underway for some
>> time now. At this juncture, comments are requested from gluster developers.
>>
>> (1) .. help call out where additional upcalls would be needed to invalidate
>> stale client cache entries (in particular, need feedback from DHT/AFR
>> areas),
>>
>> (2) .. identify failure cases, when we cannot trust the contents of md-cache,
>> e.g. when an upcall may have been dropped by the network
>
> Yes, this needs to be handled.
> It can happen only when there is a one way disconnect, where the server cannot
> reach client and notify fails. We can have a retry for the same until the cache
> expiry time.
>
>>
>> (3) .. point out additional improvements which md-cache needs. For example,
>> it cannot be allowed to grow unbounded.
>
> This is being worked on, and will be targetted for 3.9
>
>>
>> Dan
>>
>> ----- Original Message -----
>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>>
>>> List of areas where we need invalidation notification:
>>> 1. Any changes to xattrs used by xlators to store metadata (like dht layout
>>> xattr, afr xattrs etc).
>
> Currently, md-cache will negotiate(using ipc) with the brick, a list of xattrs
> that it needs invalidation for. Other xlators can add the xattrs they are interested
> in to the ipc. But then these xlators need to manage their own caching and processing
> the invalidation request, as md-cache will be above all cluater xlators.
> reference: http://review.gluster.org/#/c/15002/
>
>>> 2. Scenarios where individual xlator feels like it needs a lookup. For
>>> example failed directory creation on non-hashed subvol in dht during mkdir.
>>> Though dht succeeds mkdir, it would be better to not cache this inode as a
>>> subsequent lookup will heal the directory and make things better.
>
> For this, these xlators can specify an indicator in the dict of
> the fop cbk, to not cache. This should be fairly simple to implement.
>
>>> 3. removing of files
>
> When an unlink is issued from the mount point, the cache is invalidated.
>
>>> 4. writev on brick (to invalidate read cache on client)
>
> writev on brick from any other client will invalidate the metadata cache on all
> the other clients.
>
>>>
>>> Other questions:
>>> 5. Does md-cache has cache management? like lru or an upper limit for
>>> cache.
>
> Currently md-cache doesn't have any cache-management, we will be targeting this
> for 3.9
>
>>> 6. Network disconnects and invalidating cache. When a network disconnect
>>> happens we need to invalidate cache for inodes present on that brick as we
>>> might be missing some notifications. Current approach of purging cache of
>>> all inodes might not be optimal as it might rollback benefits of caching.
>>> Also, please note that network disconnects are not rare events.
>
> Network disconnects are handled to a minimal extent, where any brick down will
> cause the whole of the cache to be invalidated. Invalidating only the list of
> inodes that belong to that perticular brick will need the support from the
> underlying cluster xlators.
>
>>>
>>> regards,
>>> Raghavendra
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>