[Gluster-devel] handling open fds and graph switches

Tue Aug 6 08:22:40 UTC 2013

Hi,

As of now, there is a problem when following set of operations are 
performed on a file.

open () => unlink () => do a graph change (not reconfigure) => fop on 
the opened fd (may be write)

In the above set of operations, the fop performed on the fd after the 
graph switch fails with EBADFD (which should not happen). Its because 
when the file is unlinked (assume there are no other hardlinks for the 
file), the gfid handle present in the .glusterfs directory of the brick 
is removed. Now when graph change happens, all fds have to be migrated 
to the new graph. Before that a nameless lookup will be sent on the gfid 
(to build the new inode in the new graph). The nameless lookup happens 
on the gfid handle. But since the gfid handle is removed upon receiving 
the unlink, nameless lookup fails, thus failing the fd migration to the 
new graph and the fops on the fd are also failed.

A patch has been sent to handle 
this(http://review.gluster.org/#/c/5428/), where the gfid handle is 
removed when the last reference to the file is removed (i.e upon getting 
the unlink, it also checks whether there are any open fds on the inode. 
If so, then the gfid handle is not removed. Its removed when release on 
that fd is received). But that approach might lead to gfid handle leaks 
(what if glusterfsd crashes upon unlinking the last entry? the gfid 
handle might not have been removed if there are open fds. And now if 
glusterfsd crashes, then the gfid handle for that file is leaked).

Another approach might be to make posix_lookup do a stat on one of the 
fds present on the inode when it has to build a INODE HANDLE (which 
happens as part of nameless lookup). The nameless lookup suceeds and the 
new inode is looked up in the new graph for the client. But after that, 
there are 2 more issues.

1) After successful completion of the nameless lookup, the file has to 
be opened in the new graph. So a syncop_open is sent on the new graph 
for the gfid. In posix_open, posix xlator again tries to open the file 
using the gfid handle. But since the gfid handle is removed, open fails 
and the file is not opened (thus fd migration fails again.) We can 
search the list of fds for the inode, find the right fd that the fuse 
client is trying to migrate and return that fd. But searching the right 
fd is a hard task. (What if a fuse client has opened 2 fds with same flags?)

2) Another problem is open-behind. Fuse xlator after nameless lookup, 
sends syncop_open to migrate the fds. Once the syncop_open is complete 
and fds are migrated, PARENT_DOWN event is sent on the old graph and the 
client xlator sends release on all the fds (if the previous syncop_open 
is successful, then its safe to send release from old graph as the new 
fd would have been migrated to the new graph, with corresponding fd 
present in the brick). But before that in syncop_open, open-behind might 
have sent success to the fuse without actually winding the open call to 
the below xlators. Now fuse gets success for the open, sends PARENT_DOWN 
to old graph, which sends release on the fd. Thus even though a fd is 
present from application's point of view, there are no mechanisms to 
access the file (as the fds and gfid handles have been removed already.)

Please provide feedback on the above issues.

Regards,
Raghavendra Bhat