[Gluster-devel] spurios failures in tests/encryption/crypt.t

Pranith Kumar Karampuri pkarampu at redhat.com
Wed May 21 11:57:47 UTC 2014



----- Original Message -----
> From: "Anand Avati" <avati at gluster.org>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "Edward Shishkin" <edward at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Wednesday, May 21, 2014 12:36:22 PM
> Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
> 
> On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
> 
> >
> >
> > ----- Original Message -----
> > > From: "Anand Avati" <avati at gluster.org>
> > > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > > Cc: "Edward Shishkin" <edward at redhat.com>, "Gluster Devel" <
> > gluster-devel at gluster.org>
> > > Sent: Wednesday, May 21, 2014 10:53:54 AM
> > > Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
> > >
> > > There are a few suspicious things going on here..
> > >
> > > On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri <
> > > pkarampu at redhat.com> wrote:
> > >
> > > >
> > > > > > hi,
> > > > > >      crypt.t is failing regression builds once in a while and most
> > of
> > > > > > the times it is because of the failures just after the remount in
> > the
> > > > > > script.
> > > > > >
> > > > > > TEST rm -f $M0/testfile-symlink
> > > > > > TEST rm -f $M0/testfile-link
> > > > > >
> > > > > > Both of these are failing with ENOTCONN. I got a chance to look at
> > > > > > the logs. According to the brick logs, this is what I see:
> > > > > > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
> > > > > > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
> > > > > > Transport endpoint is not connected
> > > >
> > >
> > > posix_open() happening on a symlink? This should NEVER happen. glusterfs
> > > itself should NEVER EVER by triggering symlink resolution on the server.
> > In
> > > this case, for whatever reason an open() is attempted on a symlink, and
> > it
> > > is getting followed back onto gluster's own mount point (test case is
> > > creating an absolute link).
> > >
> > > So first find out: who is triggering fop->open() on a symlink. Fix the
> > > caller.

http://review.gluster.org/7824

> > >
> > > Next: add a check in posix_open() to fail with ELOOP or EINVAL if the
> > inode
> > > is a symlink.

http://review.gluster.org/7823

> >
> > I think I understood what you are saying. Open call for symlink on fuse
> > mount lead to an open call again for the target on the same fuse mount.
> 
> 
> It's not that simple. The client VFS is intelligent enough to resolve
> symlinks and send open() only on non-symlinks. And the test case script was
> doing an obvious unlink() (TEST rm -f <filename>), so it was not initiated
> by an open() attempt in the first place. My guess is that some xlator
> (probably crypt?) is doing an open() on an inode and that is going through
> unchecked in posix. It is a bug in both the caller and posix, but the
> onus/responsibility is on posix to disallow open() on anything but regular
> files (even open() on character or block devices should not happen in
> posix).
> 
> 
> 
> > Which lead to deadlock :). That is why we disallow opens on symlink in
> > gluster?
> >
> 
> That's not just why open on symlink is disallowed in gluster, it is a more
> generic problem of following symlinks in general inside gluster. Symlink
> resolution must strictly happen only in the outermost VFS. Following
> symlinks inside the filesystem is not only an invalid operation, but can
> lead to all kinds of deadlocks, security holes (what if you opened a
> symlink which points to /etc/passwd, should it show the contents of the
> client machine's /etc/passwd or the server? Now what if you wrote to the
> file through the symlink? etc. you get the idea..) and
> wrong/weird/dangerous behaviors. This is not just related to following
> symlinks, even open()ing special devices.. e.g if you create a char device
> file with major/minor number of an audio device and wrote pcm data into it,
> should it play music on the client machine or in the server machine? etc.
> The summary is, following symlinks or opening non-regular files is
> VFS/client operation and are invalid operations in a filesystem context.
> 

Now only one question remains. How could it not hang everytime?

Pranith


More information about the Gluster-devel mailing list