[Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t causes 100% CPU

Wed May 18 04:37:06 UTC 2016

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Nithya Balachandran" <nbalacha at redhat.com>
> Cc: gluster-devel at gluster.org
> Sent: Tuesday, May 17, 2016 3:06:25 PM
> Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t causes 100% CPU
> 
> 
> 
> ----- Original Message -----
> > From: "Nithya Balachandran" <nbalacha at redhat.com>
> > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > Cc: gluster-devel at gluster.org
> > Sent: Tuesday, May 17, 2016 2:55:35 PM
> > Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t
> > causes 100% CPU
> > 
> > On Tue, May 17, 2016 at 2:52 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
> > wrote:
> > 
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > > > To: "Nithya Balachandran" <nbalacha at redhat.com>
> > > > Cc: gluster-devel at gluster.org
> > > > Sent: Tuesday, May 17, 2016 2:40:32 PM
> > > > Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t
> > > causes 100% CPU
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Nithya Balachandran" <nbalacha at redhat.com>
> > > > > To: "Niels de Vos" <ndevos at redhat.com>
> > > > > Cc: gluster-devel at gluster.org
> > > > > Sent: Tuesday, May 17, 2016 2:25:20 PM
> > > > > Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t
> > > > > causes 100% CPU
> > > > >
> > > > > Hi,
> > > > >
> > > > > I have looked into this on another system earlier and this is what I
> > > have
> > > > > so
> > > > > far:
> > > > >
> > > > > 1. The test involves moving and renaming directories and files within
> > > those
> > > > > dirs.
> > > > > 2. A rename dir operation failed on one subvol. So we have 3 subvols
> > > where
> > > > > the directory has the new name and one where it has the old name.
> > > > > 3. Some operation - perhaps a revalidate - has added a dentry with
> > > > > the
> > > old
> > > > > name to the inode . So there are now 2 dentries for the same inode
> > > > > for
> > > a
> > > > > directory.
> > > >
> > > > I think the stale dentry is caused by a racing lookup and rename.
> > >
> > > Please refer to bz 1335373 [1] for more details.
> > >
> > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1335373
> > 
> > 
> > 
> > I have posted a "hacky" patch at [2]
> > 
> > [2] http://review.gluster.org/#/c/14373/
> > 
> > 
> > 
> > >
> > >
> > > > Apart from
> > > > that, I don't know of any other reasons for stale dentries in inode
> > > table.

Though the stale dentry resulted by the above race is permanent (till the process reboots), there are other cases where we can end up in a temporary stale dentry. This can happen on client side inode tables (fuse, gfapi etc). The use case is normally dentry modification (like rename, rmdir etc) from other clients and the current client is not _updated_ yet about the change in dentry. For eg., client1 might do lookup on dst after a rename from client2, there by having both src and dst in its inode table. So, "stale" dentry "src" exists in client1's inode table till cient1 does a lookup on "src" (which fails and dentry "src" is purged).

This temporary staleness caused by modifications from other clients can be solved if we remove old dentries when linking a new dentry for a directory (as directory cannot have hardlinks). This can be done only after we've some sort of synchronization like DFS to avoid parallel operations. Only with serialization we can know that the latest dentry being linked is always the correct one.

> > > > "Dentry fop serializer" (DFS) [1], aims to solve these kind of races.
> > > >
> > > > [1] http://review.gluster.org/14286
> > > >
> > > > > 4. Renaming a file inside that directory calls an inode_link which
> > > > > end
> > > up
> > > > > traversing the dentry list for each entry all the way up to the root
> > > in the
> > > > > __foreach_ancestor_dentry function. If there are multiple deep
> > > directories
> > > > > with the same problem in the path, this takes a very long time
> > > > > (hours)
> > > > > because of the number of times the function is called.
> > > > >
> > > > > I do not know why the rename dir failed. However, is the following a
> > > > > correct/acceptable fix for the traversal issue?
> > > > >
> > > > > 1. A directory should never have more than one dentry
> > > > > 2. __foreach_ancestor_dentry uses the dentry list of the parent
> > > > > inode.
> > > > > Parent
> > > > > inode will always be a directory.
> > > > > 3. Can we just take the first dentry in the list for the cycle check
> > > as we
> > > > > are really only comparing inodes? In the scenarios I have tried, all
> > > the
> > > > > dentries in the dentry_list always have the same inode. This would
> > > prevent
> > > > > the hang. If there is more than one dentry for a directory, flag an
> > > error
> > > > > somehow.
> > > > > 4. Is there any chance that a dentry in the list can have a different
> > > > > inode?
> > > > > If yes, that is a different problem and 3 does not apply.
> > > > >
> > > > > It would work like this:
> > > > >
> > > > >
> > > > > last_parent_inode = NULL;
> > > > >
> > > > > list_for_each_entry (each, &parent->dentry_list, inode_list) {
> > > > >
> > > > > //Since we are only using the each->parent to check, stop if we have
> > > > > already
> > > > > checked it
> > > > >
> > > > > if(each->parent != last_parent_inode) {
> > > > > ret = __foreach_ancestor_dentry (each, per_dentry_fn, data);
> > > > > if (ret)
> > > > > goto out;
> > > > > }
> > > > > last_parent_inode = each->parent;
> 
> Some of the issues with this approach:
> 1. If dentries have different parents (for eg., rename across directories and
> lookup race), this won't help.
> 2. Even if dentries have same parent, this assumes those dentries are
> adjacent and assumes that only two such dentries exist.
> 
> 2 is not much of an issue as most of the cases we've seen have two dentries,
> but assumption 1 is weak.
> 
> > > > > }
> > > > >
> > > > >
> > > > > This would prevent the hang but leads to other issues which would
> > > exist in
> > > > > the current code anyway - mainly, which dentry is the correct one and
> > > how
> > > > > do
> > > > > we recover?
> > > > >
> > > > >
> > > > > Regards,
> > > > > Nithya
> > > > >
> > > > > On Wed, May 11, 2016 at 7:09 PM, Niels de Vos < ndevos at redhat.com >
> > > wrote:
> > > > >
> > > > >
> > > > > Could someone look into this busy loop?
> > > > > https://paste.fedoraproject.org/365207/29732171/raw/
> > > > >
> > > > > This was happening in a regression-test burn-in run, occupying a
> > > Jenkins
> > > > > slave for 2+ days:
> > > > > https://build.gluster.org/job/regression-test-burn-in/936/
> > > > > (run with commit f0ade919006b2581ae192f997a8ae5bacc2892af from
> > > > > master)
> > > > >
> > > > > A coredump of the mount process is available from here:
> > > > > http://slave20.cloud.gluster.org/archived_builds/crash.tar.gz
> > > > >
> > > > > Thanks misc for reporting and gathering the debugging info.
> > > > > Niels
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at gluster.org
> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at gluster.org
> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > 
>