[Gluster-users] [Gluster-devel] [posix-compliance] unlink and access to file through open fd

Fri Sep 4 10:32:13 UTC 2015

On Fri, Sep 4, 2015 at 2:35 PM, Prashanth Pai <ppai at redhat.com> wrote:

>
> ----- Original Message -----
> > From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > To: gluster-devel at gluster.org
> > Cc: gluster-users at gluster.org
> > Sent: Friday, September 4, 2015 12:43:09 PM
> > Subject: [Gluster-devel] [posix-compliance] unlink and access to file
> through open fd
> >
> > All,
> >
> > Posix allows access to file through open fds even if name associated with
> > file is deleted. While this works for glusterfs for most of the cases,
> there
> > are some corner cases where we fail.
> >
> > 1. Reboot of brick:
> > ===================
> >
> > With the reboot of brick, fd is lost. unlink would've deleted both gfid
> and
> > path links to file and we would loose the file. As a solution, perhaps we
> > should create an hardlink to the file (say in .glusterfs) which gets
> deleted
> > only when last fd is closed?
> >
> > 2. Graph switch:
> > =================
> >
> > The issue is captured in bz 1259995 [1]. Pasting the content from bz
> > verbatim:
> > Consider following sequence of operations:
> > 1. fd = open ("/mnt/glusterfs/file");
> > 2. unlink ("/mnt/glusterfs/file");
> > 3. Do a graph-switch, lets say by adding a new brick to volume.
> > 4. migration of fd to new graph fails. This is because as part of
> migration
> > we do a lookup and open. But, lookup fails as file is already deleted and
> > hence migration fails and fd is marked bad.
> >
> > In fact this test case is already present in our regression tests,
> though the
> > test checks whether the fd is just marked as bad. But the expectation of
> > filing this bug is that migration should succeed. This is possible since
> > there is an fd opened on brick through old-graph and hence can be duped
> > using dup syscall.
> >
> > Of course the solution outlined here doesn't cover the case where file
> is not
> > present on brick at all. For eg., a new brick was added to replica set
> and
> > that new brick doesn't contain the file. Now, since the file is deleted,
> how
> > do replica heals that file to another brick etc.
> >
> > But atleast this can be solved for those cases where file was present on
> a
> > brick and fd was already opened.
> >
> > 3. Open-behind and unlink from a different client:
> > ==================================================
> >
> > While open-behind handles unlink from the same client (through which
> open was
> > performed), if unlink and open are done from two different clients, file
> is
> > lost. I cannot think of any good solution for this.
>
> We *may* have hit this once earlier when we had multiple instances of
> object-expirer daemon deleting huge number of objects (files).
> This was only observed at scale - deleting a million objects. Our
> user-space application flow was roughly as follows:
>
> fd = open(...)
> s = stat(fd)
> fgetxattr(fd, ....)
>
> In our case, open() and stat() succeeded but fgetxattr() failed with
> ENOENT (many times with ESTALE too) probably because some other client
> has done an unlink() on the file name already. Is this behavior normal ?
>

Its possible (may not be normal, since we are being non-posix complaint
here :)).
1. Open might've been serviced by open-behind (faking it).
2. fstat might've been served from md-cache (If it had hit open-behind, it
would've done an open before fstat is completed).
3. fgetxattr, if it hits open-behind and file is already deleted from some
other client, fgetxattr will fail with ESTALE (not ENOENT, since open is
done on gfid and if gfid cannot be looked-up, server-resolver sends out
ESTALE).

> @Thiago: Remember this one?
> http://paste.openstack.org/show/357414/
> https://gist.github.com/thiagodasilva/491e405a3385f0e85cc9
>
> >
> > I wanted to know whether these problems are real enough to channel our
> > efforts to fix these issues. Comments are welcome in terms of solutions
> or
> > other possible scenarios which can lead to this issue.
> >
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995
> >
> > regards,
> > Raghavendra.
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150904/91109920/attachment.html>