[Gluster-devel] [RFC] inode table locking contention reduction experiment

Mon Nov 4 05:26:36 UTC 2019

On 2019/11/4 12:39 下午, Amar Tumballi wrote:
> Thanks for this, github works for review right now :-)
> 
> I am occupied till Wednesday, and will review them by this week. A 
> glance on the changes looks good to me.
> 
> Few tests which can run for validations are :
> 
> tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t
> 
> tests/features/fuse-lru-limit.t
> 
> tests/bugs/shard/shard-inode-refcount-test.t
> 
> 
> Ideal is to run the full regression with `./run-tests.sh -c`

Sure, I will run the entire regression.
Actually, there are still some tiny problems with this large patch. I am 
still working on improving it including adding some debug/trace methods.

Or is there some existing trace/dump method for Glusterfs fuse client? I 
know there is something like that for brick process but can't find any 
for client.

I am planning to join Glusterfs video conference on 12th this month and 
then discuss about my idea.

Thanks,
Changwei

> 
> 
> Regards,
> 
> Amar
> 
> 
> On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <chge at linux.alibaba.com 
> <mailto:chge at linux.alibaba.com>> wrote:
> 
>     Hi Amar,
> 
>     On 2019/10/31 6:30 下午, Amar Tumballi wrote:
>      >
>      >
>      > On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez
>     <jahernan at redhat.com <mailto:jahernan at redhat.com>
>      > <mailto:jahernan at redhat.com <mailto:jahernan at redhat.com>>> wrote:
>      >
>      >     Hi Changwei,
>      >
>      >     On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge
>     <chge at linux.alibaba.com <mailto:chge at linux.alibaba.com>
>      >     <mailto:chge at linux.alibaba.com
>     <mailto:chge at linux.alibaba.com>>> wrote:
>      >
>      >         Hi,
>      >
>      >         I am recently working on reducing inode_[un]ref() locking
>      >         contention by
>      >         getting rid of inode table lock. Just use inode lock to
>     protect
>      >         inode
>      >         REF. I have already discussed a couple rounds with several
>      >         Glusterfs
>      >         developers via emails and Gerrit and basically get
>     understood on
>      >         major
>      >         logic around.
>      >
>      >         Currently, inode REF can be ZERO and be reused by
>     increasing it
>      >         to ONE.
>      >         This is IMO why we have to burden so much work for inode
>     table when
>      >         REF/UNREF. It makes inode [un]ref() and inode table and
>      >         dentries(alias)
>      >         searching hard to run concurrently.
>      >
>      >         So my question is in what cases, how can we find a inode
>     whose
>      >         REF is ZERO?
>      >
>      >         As Glusterfs store its inode memory address into kernel/fuse,
>      >         can we
>      >         conclude that only fuse_ino_to_inode() can bring back a
>     REF=0 inode?
>      >
>      >
>      > Xavi's answer below provides some insights. and same time,
>     assuming that
>      > only fuse_ino_to_inode() can bring back inode from ref=0 state (for
>      > now), is a good start.
>      >
>      >
>      >     Yes, when an inode gets refs = 0, it means that gluster code
>     is not
>      >     using it anywhere, so it cannot be referenced again unless kernel
>      >     sends new requests on the same inode. Once refs=0 and
>     nlookup=0, the
>      >     inode can be destroyed.
>      >
>      >     Inode code is quite complex right now and I haven't had time to
>      >     investigate this further, but I think we could simplify inode
>      >     management significantly (specially unref) if we add a reference
>      >     when nlookup becomes > 0, and remove a reference when
>      >     nlookup becomes 0 again. Maybe with this approach we could avoid
>      >     inode table lock in many cases. However we need to make sure we
>      >     correctly handle invalidation logic to keep inode table size
>     under
>      >     control.
>      >
>      >
>      > My suggestion is, don't wait for a complete solution for posting the
>      > patch. Let us get a chance to have a look at WorkInProgress
>     patches, so
>      > we can have discussions on code itself. It would help to reach
>     better
>      > solutions sooner.
> 
>     Agree.
> 
>     I have almost implemented my draft design for this experiment.
>     The immature code has been pushed to my personal Glusterfs repo[1].
> 
>     Now it's a single large patch, I will split it to patches when I decide
>     to push it to Gerrit for review convenience. If you prefer to push
>     it to
>     Gerrit for a early review and discussion, I can do that :-). But I am
>     still doing some debug stuff.
> 
>     My work includes:
> 
>     1. Move inode refing and unrefing logic unrelated logic out from
>     `__inode_[un]ref()` hence to reduce their arguments.
>     2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
>     3. As `inode_table::active_size` is only used for debug purpose,
>     convert
>     it to atomic variable.
>     4. Factor out pruning inode.
>     5. In order to run inode search and grep run concurrently, firstly  use
>     RDLOCK  and then convert it WRLOCK if necessary.
>     6. Inode table lock is not necessary for inode ref/unref unless we have
>     to move it between table lists.
> 
>     etc...
> 
>     Any comments, ideas, suggestions are kindly welcomed.
> 
>     Thanks,
>     Changwei
> 
>     [1]:
>     https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330
> 
>      >
>      >     Regards,
>      >
>      >     Xavi
>      >
>      >
>      >
>      >         Thanks,
>      >         Changwei
>      >         _______________________________________________
>      >
>      >         Community Meeting Calendar:
>      >
>      >         APAC Schedule -
>      >         Every 2nd and 4th Tuesday at 11:30 AM IST
>      >         Bridge: https://bluejeans.com/118564314
>      >
>      >         NA/EMEA Schedule -
>      >         Every 1st and 3rd Tuesday at 01:00 PM EDT
>      >         Bridge: https://bluejeans.com/118564314
>      >
>      >         Gluster-devel mailing list
>      > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>     <mailto:Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>>
>      > https://lists.gluster.org/mailman/listinfo/gluster-devel
>      >
>