[Gluster-devel] md-cache improvements

Fri Aug 12 04:59:25 UTC 2016

On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghavendra at gluster.com>
wrote:

> Couple of more areas to explore:
> 1. purging kernel dentry and/or page-cache too. Because of patch [1],
> upcall notification can result in a call to inode_invalidate, which results
> in an "invalidate" notification to fuse kernel module. While I am sure
> that, this notification will purge page-cache from kernel, I am not sure
> about dentries. I assume if an inode is invalidated, it should result in a
> lookup (from kernel to glusterfs). But neverthless, we should look into
> differences between entry_invalidation and inode_invalidation and harness
> them appropriately.
>
> 2. Granularity of invalidation. For eg., We shouldn't be purging
> page-cache in kernel, because of a change in xattr used by an xlator (eg.,
> dht layout xattr). We have to make sure that [1] is handling this. We need
> to add more granularity into invaldation (like internal xattr invalidation,
> user xattr invalidation, entry invalidation in kernel, page-cache
> invalidation in kernel, attribute/stat invalidation in kernel etc) and use
> them judiciously, while making sure other cached data remains to be present.
>

To stress the importance of this point, it should be noted that with tier
there can be constant migration of files, which can result in spurious
(from perspective of application) invalidations, even though application is
not doing any writes on files [2][3][4]. Also, even if application is
writing to file, there is no point in invalidating dentry cache. We should
explore more ways to solve [2][3][4].

3. We've a long standing issue of spurious termination of fuse invalidation
thread. Since after termination, the thread is not re-spawned, we would not
be able to purge kernel entry/attribute/page-cache. This issue was touched
upon during a discussion [5], though we didn't solve the problem then for
lack of bandwidth. Csaba has agreed to work on this issue.

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9
[5]
http://review.gluster.org/#/c/13274/1/xlators/mount/fuse/src/fuse-bridge.c

>
> [1] http://review.gluster.org/12951
>
>
> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlambrig at redhat.com>
> wrote:
>
>>
>> There have been recurring discussions within the gluster community to
>> build on existing support for md-cache and upcalls to help performance for
>> small file workloads. In certain cases, "lookup amplification" dominates
>> data transfers, i.e. the cumulative round trip times of multiple LOOKUPs
>> from the client mitigates benefits from faster backend storage.
>>
>> To tackle this problem, one suggestion is to more aggressively utilize
>> md-cache to cache inodes on the client than is currently done. The inodes
>> would be cached until they are invalidated by the server.
>>
>> Several gluster development engineers within the DHT, NFS, and Samba
>> teams have been involved with related efforts, which have been underway for
>> some time now. At this juncture, comments are requested from gluster
>> developers.
>>
>> (1) .. help call out where additional upcalls would be needed to
>> invalidate stale client cache entries (in particular, need feedback from
>> DHT/AFR areas),
>>
>> (2) .. identify failure cases, when we cannot trust the contents of
>> md-cache, e.g. when an upcall may have been dropped by the network
>>
>> (3) .. point out additional improvements which md-cache needs. For
>> example, it cannot be allowed to grow unbounded.
>>
>> Dan
>>
>> ----- Original Message -----
>> > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> >
>> > List of areas where we need invalidation notification:
>> > 1. Any changes to xattrs used by xlators to store metadata (like dht
>> layout
>> > xattr, afr xattrs etc).
>> > 2. Scenarios where individual xlator feels like it needs a lookup. For
>> > example failed directory creation on non-hashed subvol in dht during
>> mkdir.
>> > Though dht succeeds mkdir, it would be better to not cache this inode
>> as a
>> > subsequent lookup will heal the directory and make things better.
>> > 3. removing of files
>> > 4. writev on brick (to invalidate read cache on client)
>> >
>> > Other questions:
>> > 5. Does md-cache has cache management? like lru or an upper limit for
>> cache.
>> > 6. Network disconnects and invalidating cache. When a network disconnect
>> > happens we need to invalidate cache for inodes present on that brick as
>> we
>> > might be missing some notifications. Current approach of purging cache
>> of
>> > all inodes might not be optimal as it might rollback benefits of
>> caching.
>> > Also, please note that network disconnects are not rare events.
>> >
>> > regards,
>> > Raghavendra
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160812/18d3e6ac/attachment.html>