[Gluster-devel] [RFC] inode table locking contention reduction experiment
Changwei Ge
chge at linux.alibaba.com
Mon Nov 4 05:26:36 UTC 2019
On 2019/11/4 12:39 下午, Amar Tumballi wrote:
> Thanks for this, github works for review right now :-)
>
> I am occupied till Wednesday, and will review them by this week. A
> glance on the changes looks good to me.
>
> Few tests which can run for validations are :
>
> tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t
>
> tests/features/fuse-lru-limit.t
>
> tests/bugs/shard/shard-inode-refcount-test.t
>
>
> Ideal is to run the full regression with `./run-tests.sh -c`
Sure, I will run the entire regression.
Actually, there are still some tiny problems with this large patch. I am
still working on improving it including adding some debug/trace methods.
Or is there some existing trace/dump method for Glusterfs fuse client? I
know there is something like that for brick process but can't find any
for client.
I am planning to join Glusterfs video conference on 12th this month and
then discuss about my idea.
Thanks,
Changwei
>
>
> Regards,
>
> Amar
>
>
> On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <chge at linux.alibaba.com
> <mailto:chge at linux.alibaba.com>> wrote:
>
> Hi Amar,
>
> On 2019/10/31 6:30 下午, Amar Tumballi wrote:
> >
> >
> > On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez
> <jahernan at redhat.com <mailto:jahernan at redhat.com>
> > <mailto:jahernan at redhat.com <mailto:jahernan at redhat.com>>> wrote:
> >
> > Hi Changwei,
> >
> > On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge
> <chge at linux.alibaba.com <mailto:chge at linux.alibaba.com>
> > <mailto:chge at linux.alibaba.com
> <mailto:chge at linux.alibaba.com>>> wrote:
> >
> > Hi,
> >
> > I am recently working on reducing inode_[un]ref() locking
> > contention by
> > getting rid of inode table lock. Just use inode lock to
> protect
> > inode
> > REF. I have already discussed a couple rounds with several
> > Glusterfs
> > developers via emails and Gerrit and basically get
> understood on
> > major
> > logic around.
> >
> > Currently, inode REF can be ZERO and be reused by
> increasing it
> > to ONE.
> > This is IMO why we have to burden so much work for inode
> table when
> > REF/UNREF. It makes inode [un]ref() and inode table and
> > dentries(alias)
> > searching hard to run concurrently.
> >
> > So my question is in what cases, how can we find a inode
> whose
> > REF is ZERO?
> >
> > As Glusterfs store its inode memory address into kernel/fuse,
> > can we
> > conclude that only fuse_ino_to_inode() can bring back a
> REF=0 inode?
> >
> >
> > Xavi's answer below provides some insights. and same time,
> assuming that
> > only fuse_ino_to_inode() can bring back inode from ref=0 state (for
> > now), is a good start.
> >
> >
> > Yes, when an inode gets refs = 0, it means that gluster code
> is not
> > using it anywhere, so it cannot be referenced again unless kernel
> > sends new requests on the same inode. Once refs=0 and
> nlookup=0, the
> > inode can be destroyed.
> >
> > Inode code is quite complex right now and I haven't had time to
> > investigate this further, but I think we could simplify inode
> > management significantly (specially unref) if we add a reference
> > when nlookup becomes > 0, and remove a reference when
> > nlookup becomes 0 again. Maybe with this approach we could avoid
> > inode table lock in many cases. However we need to make sure we
> > correctly handle invalidation logic to keep inode table size
> under
> > control.
> >
> >
> > My suggestion is, don't wait for a complete solution for posting the
> > patch. Let us get a chance to have a look at WorkInProgress
> patches, so
> > we can have discussions on code itself. It would help to reach
> better
> > solutions sooner.
>
> Agree.
>
> I have almost implemented my draft design for this experiment.
> The immature code has been pushed to my personal Glusterfs repo[1].
>
> Now it's a single large patch, I will split it to patches when I decide
> to push it to Gerrit for review convenience. If you prefer to push
> it to
> Gerrit for a early review and discussion, I can do that :-). But I am
> still doing some debug stuff.
>
> My work includes:
>
> 1. Move inode refing and unrefing logic unrelated logic out from
> `__inode_[un]ref()` hence to reduce their arguments.
> 2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.
> 3. As `inode_table::active_size` is only used for debug purpose,
> convert
> it to atomic variable.
> 4. Factor out pruning inode.
> 5. In order to run inode search and grep run concurrently, firstly use
> RDLOCK and then convert it WRLOCK if necessary.
> 6. Inode table lock is not necessary for inode ref/unref unless we have
> to move it between table lists.
>
> etc...
>
> Any comments, ideas, suggestions are kindly welcomed.
>
> Thanks,
> Changwei
>
> [1]:
> https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330
>
> >
> > Regards,
> >
> > Xavi
> >
> >
> >
> > Thanks,
> > Changwei
> > _______________________________________________
> >
> > Community Meeting Calendar:
> >
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: https://bluejeans.com/118564314
> >
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: https://bluejeans.com/118564314
> >
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> <mailto:Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>>
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
> >
>
More information about the Gluster-devel
mailing list