<div dir="ltr">Thanks for this, github works for review right now :-)<div><br></div><div>I am occupied till Wednesday, and will review them by this week. A glance on the changes looks good to me.</div><div><br></div><div>Few tests which can run for validations are :</div><div><br></div><div><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px;color:rgb(51,51,51);font-size:14px">tests/bugs/shard/bug-1696136-lru-limit-equals-deletion-rate.t</pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px;color:rgb(51,51,51);font-size:14px"><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px">tests/features/fuse-lru-limit.t</pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px">tests/bugs/shard/shard-inode-refcount-test.t</pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px"><br></pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px">Ideal is to run the full regression with `./run-tests.sh -c` </pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px"><br></pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px">Regards,</pre><pre class="gmail-console-output" style="box-sizing:border-box;white-space:pre-wrap;margin-top:0px;margin-bottom:0px">Amar</pre></pre></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 4, 2019 at 9:21 AM Changwei Ge <<a href="mailto:chge@linux.alibaba.com">chge@linux.alibaba.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Amar,<br>
<br>
On 2019/10/31 6:30 下午, Amar Tumballi wrote:<br>
> <br>
> <br>
> On Wed, Oct 30, 2019 at 4:32 PM Xavi Hernandez <<a href="mailto:jahernan@redhat.com" target="_blank">jahernan@redhat.com</a> <br>
> <mailto:<a href="mailto:jahernan@redhat.com" target="_blank">jahernan@redhat.com</a>>> wrote:<br>
> <br>
> Hi Changwei,<br>
> <br>
> On Tue, Oct 29, 2019 at 7:56 AM Changwei Ge <<a href="mailto:chge@linux.alibaba.com" target="_blank">chge@linux.alibaba.com</a><br>
> <mailto:<a href="mailto:chge@linux.alibaba.com" target="_blank">chge@linux.alibaba.com</a>>> wrote:<br>
> <br>
> Hi,<br>
> <br>
> I am recently working on reducing inode_[un]ref() locking<br>
> contention by<br>
> getting rid of inode table lock. Just use inode lock to protect<br>
> inode<br>
> REF. I have already discussed a couple rounds with several<br>
> Glusterfs<br>
> developers via emails and Gerrit and basically get understood on<br>
> major<br>
> logic around.<br>
> <br>
> Currently, inode REF can be ZERO and be reused by increasing it<br>
> to ONE.<br>
> This is IMO why we have to burden so much work for inode table when<br>
> REF/UNREF. It makes inode [un]ref() and inode table and<br>
> dentries(alias)<br>
> searching hard to run concurrently.<br>
> <br>
> So my question is in what cases, how can we find a inode whose<br>
> REF is ZERO?<br>
> <br>
> As Glusterfs store its inode memory address into kernel/fuse,<br>
> can we<br>
> conclude that only fuse_ino_to_inode() can bring back a REF=0 inode?<br>
> <br>
> <br>
> Xavi's answer below provides some insights. and same time, assuming that <br>
> only fuse_ino_to_inode() can bring back inode from ref=0 state (for <br>
> now), is a good start.<br>
> <br>
> <br>
> Yes, when an inode gets refs = 0, it means that gluster code is not<br>
> using it anywhere, so it cannot be referenced again unless kernel<br>
> sends new requests on the same inode. Once refs=0 and nlookup=0, the<br>
> inode can be destroyed.<br>
> <br>
> Inode code is quite complex right now and I haven't had time to<br>
> investigate this further, but I think we could simplify inode<br>
> management significantly (specially unref) if we add a reference<br>
> when nlookup becomes > 0, and remove a reference when<br>
> nlookup becomes 0 again. Maybe with this approach we could avoid<br>
> inode table lock in many cases. However we need to make sure we<br>
> correctly handle invalidation logic to keep inode table size under<br>
> control.<br>
> <br>
> <br>
> My suggestion is, don't wait for a complete solution for posting the <br>
> patch. Let us get a chance to have a look at WorkInProgress patches, so <br>
> we can have discussions on code itself. It would help to reach better <br>
> solutions sooner.<br>
<br>
Agree.<br>
<br>
I have almost implemented my draft design for this experiment.<br>
The immature code has been pushed to my personal Glusterfs repo[1].<br>
<br>
Now it's a single large patch, I will split it to patches when I decide <br>
to push it to Gerrit for review convenience. If you prefer to push it to <br>
Gerrit for a early review and discussion, I can do that :-). But I am <br>
still doing some debug stuff.<br>
<br>
My work includes:<br>
<br>
1. Move inode refing and unrefing logic unrelated logic out from <br>
`__inode_[un]ref()` hence to reduce their arguments.<br>
2. Add a specific ‘ref_lock’ to inode to keep ref/unref atomicity.<br>
3. As `inode_table::active_size` is only used for debug purpose, convert <br>
it to atomic variable.<br>
4. Factor out pruning inode.<br>
5. In order to run inode search and grep run concurrently, firstly use <br>
RDLOCK and then convert it WRLOCK if necessary.<br>
6. Inode table lock is not necessary for inode ref/unref unless we have <br>
to move it between table lists.<br>
<br>
etc...<br>
<br>
Any comments, ideas, suggestions are kindly welcomed.<br>
<br>
Thanks,<br>
Changwei<br>
<br>
[1]:<br>
<a href="https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330" rel="noreferrer" target="_blank">https://github.com/changweige/glusterfs/commit/d7226d2458281212af19ec8c2ca3d8c8caae1330</a><br>
<br>
> <br>
> Regards,<br>
> <br>
> Xavi<br>
> <br>
> <br>
> <br>
> Thanks,<br>
> Changwei<br>
> _______________________________________________<br>
> <br>
> Community Meeting Calendar:<br>
> <br>
> APAC Schedule -<br>
> Every 2nd and 4th Tuesday at 11:30 AM IST<br>
> Bridge: <a href="https://bluejeans.com/118564314" rel="noreferrer" target="_blank">https://bluejeans.com/118564314</a><br>
> <br>
> NA/EMEA Schedule -<br>
> Every 1st and 3rd Tuesday at 01:00 PM EDT<br>
> Bridge: <a href="https://bluejeans.com/118564314" rel="noreferrer" target="_blank">https://bluejeans.com/118564314</a><br>
> <br>
> Gluster-devel mailing list<br>
> <a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a> <mailto:<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>><br>
> <a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a><br>
> <br>
</blockquote></div>