[Gluster-devel] md-cache improvements

Poornima Gurusiddaiah pgurusid at redhat.com
Thu Aug 11 05:04:48 UTC 2016


My comments inline.

Regards,
Poornima

----- Original Message -----
> From: "Dan Lambright" <dlambrig at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Wednesday, August 10, 2016 10:35:58 PM
> Subject: [Gluster-devel] md-cache improvements
> 
> 
> There have been recurring discussions within the gluster community to build
> on existing support for md-cache and upcalls to help performance for small
> file workloads. In certain cases, "lookup amplification" dominates data
> transfers, i.e. the cumulative round trip times of multiple LOOKUPs from the
> client mitigates benefits from faster backend storage.
> 
> To tackle this problem, one suggestion is to more aggressively utilize
> md-cache to cache inodes on the client than is currently done. The inodes
> would be cached until they are invalidated by the server.
> 
> Several gluster development engineers within the DHT, NFS, and Samba teams
> have been involved with related efforts, which have been underway for some
> time now. At this juncture, comments are requested from gluster developers.
> 
> (1) .. help call out where additional upcalls would be needed to invalidate
> stale client cache entries (in particular, need feedback from DHT/AFR
> areas),
> 
> (2) .. identify failure cases, when we cannot trust the contents of md-cache,
> e.g. when an upcall may have been dropped by the network

Yes, this needs to be handled.
It can happen only when there is a one way disconnect, where the server cannot
reach client and notify fails. We can have a retry for the same until the cache
expiry time.

> 
> (3) .. point out additional improvements which md-cache needs. For example,
> it cannot be allowed to grow unbounded.

This is being worked on, and will be targetted for 3.9

> 
> Dan
> 
> ----- Original Message -----
> > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > 
> > List of areas where we need invalidation notification:
> > 1. Any changes to xattrs used by xlators to store metadata (like dht layout
> > xattr, afr xattrs etc).

Currently, md-cache will negotiate(using ipc) with the brick, a list of xattrs
that it needs invalidation for. Other xlators can add the xattrs they are interested
in to the ipc. But then these xlators need to manage their own caching and processing
the invalidation request, as md-cache will be above all cluater xlators.
reference: http://review.gluster.org/#/c/15002/

> > 2. Scenarios where individual xlator feels like it needs a lookup. For
> > example failed directory creation on non-hashed subvol in dht during mkdir.
> > Though dht succeeds mkdir, it would be better to not cache this inode as a
> > subsequent lookup will heal the directory and make things better.

For this, these xlators can specify an indicator in the dict of
the fop cbk, to not cache. This should be fairly simple to implement.

> > 3. removing of files

When an unlink is issued from the mount point, the cache is invalidated.

> > 4. writev on brick (to invalidate read cache on client)

writev on brick from any other client will invalidate the metadata cache on all
the other clients.

> > 
> > Other questions:
> > 5. Does md-cache has cache management? like lru or an upper limit for
> > cache.

Currently md-cache doesn't have any cache-management, we will be targeting this
for 3.9

> > 6. Network disconnects and invalidating cache. When a network disconnect
> > happens we need to invalidate cache for inodes present on that brick as we
> > might be missing some notifications. Current approach of purging cache of
> > all inodes might not be optimal as it might rollback benefits of caching.
> > Also, please note that network disconnects are not rare events.

Network disconnects are handled to a minimal extent, where any brick down will
cause the whole of the cache to be invalidated. Invalidating only the list of
inodes that belong to that perticular brick will need the support from the
underlying cluster xlators.

> > 
> > regards,
> > Raghavendra
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list