[Gluster-devel] md-cache improvements

Niels de Vos ndevos at redhat.com
Thu Aug 18 13:32:34 UTC 2016


On Mon, Aug 15, 2016 at 10:39:40PM -0400, Vijay Bellur wrote:
> Hi Poornima, Dan -
> 
> Let us have a hangout/bluejeans session this week to discuss the planned
> md-cache improvements, proposed timelines and sort out open questions if
> any.
> 
> Would 11:00 UTC on Wednesday work for everyone in the To: list?

I'd appreciate it if someone could send the meeting minutes. It'll make
it easier to follow up and we can provide better status details on the
progress.

In any case, one of the points that Poornima mentioned was that upcall
events (when enabled) get cached in gfapi until the application handles
them. NFS-Ganesha is the only application that (currently) is interested
in these events. Other use-cases (like md-cache invalidation) would
enable upcalls too, and then cause event caching even when not needed.

This change should address that, and I'm waiting for feedback on it.
There should be a bug report about these unneeded and uncleared caches,
but I could not find one...

  gfapi: do not cache upcalls if the application is not interested
  http://review.gluster.org/15191

Thanks,
Niels


> 
> Thanks,
> Vijay
> 
> 
> 
> On 08/11/2016 01:04 AM, Poornima Gurusiddaiah wrote:
> > 
> > My comments inline.
> > 
> > Regards,
> > Poornima
> > 
> > ----- Original Message -----
> > > From: "Dan Lambright" <dlambrig at redhat.com>
> > > To: "Gluster Devel" <gluster-devel at gluster.org>
> > > Sent: Wednesday, August 10, 2016 10:35:58 PM
> > > Subject: [Gluster-devel] md-cache improvements
> > > 
> > > 
> > > There have been recurring discussions within the gluster community to build
> > > on existing support for md-cache and upcalls to help performance for small
> > > file workloads. In certain cases, "lookup amplification" dominates data
> > > transfers, i.e. the cumulative round trip times of multiple LOOKUPs from the
> > > client mitigates benefits from faster backend storage.
> > > 
> > > To tackle this problem, one suggestion is to more aggressively utilize
> > > md-cache to cache inodes on the client than is currently done. The inodes
> > > would be cached until they are invalidated by the server.
> > > 
> > > Several gluster development engineers within the DHT, NFS, and Samba teams
> > > have been involved with related efforts, which have been underway for some
> > > time now. At this juncture, comments are requested from gluster developers.
> > > 
> > > (1) .. help call out where additional upcalls would be needed to invalidate
> > > stale client cache entries (in particular, need feedback from DHT/AFR
> > > areas),
> > > 
> > > (2) .. identify failure cases, when we cannot trust the contents of md-cache,
> > > e.g. when an upcall may have been dropped by the network
> > 
> > Yes, this needs to be handled.
> > It can happen only when there is a one way disconnect, where the server cannot
> > reach client and notify fails. We can have a retry for the same until the cache
> > expiry time.
> > 
> > > 
> > > (3) .. point out additional improvements which md-cache needs. For example,
> > > it cannot be allowed to grow unbounded.
> > 
> > This is being worked on, and will be targetted for 3.9
> > 
> > > 
> > > Dan
> > > 
> > > ----- Original Message -----
> > > > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > > > 
> > > > List of areas where we need invalidation notification:
> > > > 1. Any changes to xattrs used by xlators to store metadata (like dht layout
> > > > xattr, afr xattrs etc).
> > 
> > Currently, md-cache will negotiate(using ipc) with the brick, a list of xattrs
> > that it needs invalidation for. Other xlators can add the xattrs they are interested
> > in to the ipc. But then these xlators need to manage their own caching and processing
> > the invalidation request, as md-cache will be above all cluater xlators.
> > reference: http://review.gluster.org/#/c/15002/
> > 
> > > > 2. Scenarios where individual xlator feels like it needs a lookup. For
> > > > example failed directory creation on non-hashed subvol in dht during mkdir.
> > > > Though dht succeeds mkdir, it would be better to not cache this inode as a
> > > > subsequent lookup will heal the directory and make things better.
> > 
> > For this, these xlators can specify an indicator in the dict of
> > the fop cbk, to not cache. This should be fairly simple to implement.
> > 
> > > > 3. removing of files
> > 
> > When an unlink is issued from the mount point, the cache is invalidated.
> > 
> > > > 4. writev on brick (to invalidate read cache on client)
> > 
> > writev on brick from any other client will invalidate the metadata cache on all
> > the other clients.
> > 
> > > > 
> > > > Other questions:
> > > > 5. Does md-cache has cache management? like lru or an upper limit for
> > > > cache.
> > 
> > Currently md-cache doesn't have any cache-management, we will be targeting this
> > for 3.9
> > 
> > > > 6. Network disconnects and invalidating cache. When a network disconnect
> > > > happens we need to invalidate cache for inodes present on that brick as we
> > > > might be missing some notifications. Current approach of purging cache of
> > > > all inodes might not be optimal as it might rollback benefits of caching.
> > > > Also, please note that network disconnects are not rare events.
> > 
> > Network disconnects are handled to a minimal extent, where any brick down will
> > cause the whole of the cache to be invalidated. Invalidating only the list of
> > inodes that belong to that perticular brick will need the support from the
> > underlying cluster xlators.
> > 
> > > > 
> > > > regards,
> > > > Raghavendra
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160818/09c92492/attachment.sig>


More information about the Gluster-devel mailing list