[Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t causes 100% CPU

Nithya Balachandran nbalacha at redhat.com
Tue May 17 09:25:35 UTC 2016


On Tue, May 17, 2016 at 2:52 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

>
>
> ----- Original Message -----
> > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > To: "Nithya Balachandran" <nbalacha at redhat.com>
> > Cc: gluster-devel at gluster.org
> > Sent: Tuesday, May 17, 2016 2:40:32 PM
> > Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t
> causes 100% CPU
> >
> >
> >
> > ----- Original Message -----
> > > From: "Nithya Balachandran" <nbalacha at redhat.com>
> > > To: "Niels de Vos" <ndevos at redhat.com>
> > > Cc: gluster-devel at gluster.org
> > > Sent: Tuesday, May 17, 2016 2:25:20 PM
> > > Subject: Re: [Gluster-devel] 'mv' of ./tests/bugs/posix/bug-1113960.t
> > > causes 100% CPU
> > >
> > > Hi,
> > >
> > > I have looked into this on another system earlier and this is what I
> have
> > > so
> > > far:
> > >
> > > 1. The test involves moving and renaming directories and files within
> those
> > > dirs.
> > > 2. A rename dir operation failed on one subvol. So we have 3 subvols
> where
> > > the directory has the new name and one where it has the old name.
> > > 3. Some operation - perhaps a revalidate - has added a dentry with the
> old
> > > name to the inode . So there are now 2 dentries for the same inode for
> a
> > > directory.
> >
> > I think the stale dentry is caused by a racing lookup and rename.
>
> Please refer to bz 1335373 [1] for more details.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1335373



I have posted a "hacky" patch at [2]

[2] http://review.gluster.org/#/c/14373/



>
>
> > Apart from
> > that, I don't know of any other reasons for stale dentries in inode
> table.
> > "Dentry fop serializer" (DFS) [1], aims to solve these kind of races.
> >
> > [1] http://review.gluster.org/14286
> >
> > > 4. Renaming a file inside that directory calls an inode_link which end
> up
> > > traversing the dentry list for each entry all the way up to the root
> in the
> > > __foreach_ancestor_dentry function. If there are multiple deep
> directories
> > > with the same problem in the path, this takes a very long time (hours)
> > > because of the number of times the function is called.
> > >
> > > I do not know why the rename dir failed. However, is the following a
> > > correct/acceptable fix for the traversal issue?
> > >
> > > 1. A directory should never have more than one dentry
> > > 2. __foreach_ancestor_dentry uses the dentry list of the parent inode.
> > > Parent
> > > inode will always be a directory.
> > > 3. Can we just take the first dentry in the list for the cycle check
> as we
> > > are really only comparing inodes? In the scenarios I have tried, all
> the
> > > dentries in the dentry_list always have the same inode. This would
> prevent
> > > the hang. If there is more than one dentry for a directory, flag an
> error
> > > somehow.
> > > 4. Is there any chance that a dentry in the list can have a different
> > > inode?
> > > If yes, that is a different problem and 3 does not apply.
> > >
> > > It would work like this:
> > >
> > >
> > > last_parent_inode = NULL;
> > >
> > > list_for_each_entry (each, &parent->dentry_list, inode_list) {
> > >
> > > //Since we are only using the each->parent to check, stop if we have
> > > already
> > > checked it
> > >
> > > if(each->parent != last_parent_inode) {
> > > ret = __foreach_ancestor_dentry (each, per_dentry_fn, data);
> > > if (ret)
> > > goto out;
> > > }
> > > last_parent_inode = each->parent;
> > > }
> > >
> > >
> > > This would prevent the hang but leads to other issues which would
> exist in
> > > the current code anyway - mainly, which dentry is the correct one and
> how
> > > do
> > > we recover?
> > >
> > >
> > > Regards,
> > > Nithya
> > >
> > > On Wed, May 11, 2016 at 7:09 PM, Niels de Vos < ndevos at redhat.com >
> wrote:
> > >
> > >
> > > Could someone look into this busy loop?
> > > https://paste.fedoraproject.org/365207/29732171/raw/
> > >
> > > This was happening in a regression-test burn-in run, occupying a
> Jenkins
> > > slave for 2+ days:
> > > https://build.gluster.org/job/regression-test-burn-in/936/
> > > (run with commit f0ade919006b2581ae192f997a8ae5bacc2892af from master)
> > >
> > > A coredump of the mount process is available from here:
> > > http://slave20.cloud.gluster.org/archived_builds/crash.tar.gz
> > >
> > > Thanks misc for reporting and gathering the debugging info.
> > > Niels
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160517/65d5dedc/attachment-0001.html>


More information about the Gluster-devel mailing list