[Gluster-users] [Gluster-devel] [posix-compliance] unlink and access to file through open fd

Prashanth Pai ppai at redhat.com
Fri Sep 4 09:05:36 UTC 2015


----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: gluster-devel at gluster.org
> Cc: gluster-users at gluster.org
> Sent: Friday, September 4, 2015 12:43:09 PM
> Subject: [Gluster-devel] [posix-compliance] unlink and access to file	through open fd
> 
> All,
> 
> Posix allows access to file through open fds even if name associated with
> file is deleted. While this works for glusterfs for most of the cases, there
> are some corner cases where we fail.
> 
> 1. Reboot of brick:
> ===================
> 
> With the reboot of brick, fd is lost. unlink would've deleted both gfid and
> path links to file and we would loose the file. As a solution, perhaps we
> should create an hardlink to the file (say in .glusterfs) which gets deleted
> only when last fd is closed?
> 
> 2. Graph switch:
> =================
> 
> The issue is captured in bz 1259995 [1]. Pasting the content from bz
> verbatim:
> Consider following sequence of operations:
> 1. fd = open ("/mnt/glusterfs/file");
> 2. unlink ("/mnt/glusterfs/file");
> 3. Do a graph-switch, lets say by adding a new brick to volume.
> 4. migration of fd to new graph fails. This is because as part of migration
> we do a lookup and open. But, lookup fails as file is already deleted and
> hence migration fails and fd is marked bad.
> 
> In fact this test case is already present in our regression tests, though the
> test checks whether the fd is just marked as bad. But the expectation of
> filing this bug is that migration should succeed. This is possible since
> there is an fd opened on brick through old-graph and hence can be duped
> using dup syscall.
> 
> Of course the solution outlined here doesn't cover the case where file is not
> present on brick at all. For eg., a new brick was added to replica set and
> that new brick doesn't contain the file. Now, since the file is deleted, how
> do replica heals that file to another brick etc.
> 
> But atleast this can be solved for those cases where file was present on a
> brick and fd was already opened.
> 
> 3. Open-behind and unlink from a different client:
> ==================================================
> 
> While open-behind handles unlink from the same client (through which open was
> performed), if unlink and open are done from two different clients, file is
> lost. I cannot think of any good solution for this.

We *may* have hit this once earlier when we had multiple instances of object-expirer daemon deleting huge number of objects (files).
This was only observed at scale - deleting a million objects. Our user-space application flow was roughly as follows:

fd = open(...)
s = stat(fd)
fgetxattr(fd, ....)

In our case, open() and stat() succeeded but fgetxattr() failed with ENOENT (many times with ESTALE too) probably because some other client
has done an unlink() on the file name already. Is this behavior normal ?

@Thiago: Remember this one?
http://paste.openstack.org/show/357414/
https://gist.github.com/thiagodasilva/491e405a3385f0e85cc9

> 
> I wanted to know whether these problems are real enough to channel our
> efforts to fix these issues. Comments are welcome in terms of solutions or
> other possible scenarios which can lead to this issue.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995
> 
> regards,
> Raghavendra.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-users mailing list