Chaloulos, Klearchos (Nokia - GR/Athens) klearchos.chaloulos at nokia.com
Thu Jun 23 12:24:28 UTC 2016


We use glusterfs version 3.6.9 as a shared storage solution. The linux kernel version is 4.1.20. Our setup consists of replica 2 volumes, no distribute. We have seen that occasionally readdir operations return the "Stale file handle" error. Below are the client logs:

[2016-06-15 09:29:59.717521] W [fuse-bridge.c:1001:fuse_fd_cbk] 0-glusterfs-fuse: 598: OPENDIR() /folder1/folder2/folder3/folder4/folder5 => -1 (Stale file handle)
[2016-06-15 09:29:59.717851] W [defaults.c:2177:default_releasedir] (--> /usr/lib64/glusterfs/libglusterfs.so.0(_gf_log_callingfn+0x218)[0x7f28346e9bdd] (--> /usr/lib64/glusterfs/libglusterfs.so.0(default_releasedir+0x44)[0x7f28347035d4] (--> /usr/lib64/glusterfs/libglusterfs.so.0(+0x5c6a1)[0x7f28347236a1] (--> /usr/lib64/glusterfs/libglusterfs.so.0(fd_unref+0x9d)[0x7f28347238c3] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/protocol/client.so(client_local_wipe+0x56)[0x7f282bdd2549] ))))) 0-fuse: xlator does not implement releasedir_cbk

The issue is temporary. We have created a script that continuously does an ls on the directory, and the error appeared for 30 seconds in one case. For these 30 seconds, the ls command showed the following output:
ls: cannot open directory '/folder1/folder2/folder3/folder4/folder5 ': Stale file handle
After 30 seconds, no error appeared and the directory contents were listed normally.

Do you think this is related to the bugs below:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1041109
[2] https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/3.0_Update_4_Release_Notes/chap-Known_Issues.html

Are the bugs above still applicable in the 3.6.9 release? In [2] there is a suggested workaround, to use "gluster volume set VOLNAME quick-read off". Do you think it will fix the stale file handle issue? Won't this cause a decrease in performance?

On a more general level, what can cause the "Stale file handle"? In this link
[3] http://www.cyberciti.biz/tips/nfs-stale-file-handle-error-and-solution.html
it says that it occurs when one client holds an active handle (open file descriptor?) to a file/directory that is deleted by another client or directly on the server. But in our case the issue is temporary so it doesn't look that the problem is deletion by another client.

Best regards,


