[Gluster-users] Gluster (3.6.3) NFS READDIR failing intermittently from Finder on Mac OS X (10.10 and 10.11)

Brett Randall brett.randall at gmail.com
Thu Mar 10 07:18:44 UTC 2016

Hi all


I have a problem which is doing my head in.


We are running Gluster 3.6.3 with the in-built NFS server, across 8 servers.
We share our volume out with SMB, AFP and Gluster's NFS server.


In most cases, NFS works fine. Everything is visible and accessible from the
terminal. But from Finder on our Macs, we are having a consistent problem.


Firstly, we are mounting the share from the command line:


$ mount -t nfs -o rw,intr,nolock,tcp ./glusvol


We then open Finder and traverse to the folder in question (about 7 levels
deep). I see about 20-30 items, but I know there are 100+ items in there.
This is the case on multiple folders. If I open a terminal, go to that
folder, and create a new empty file, the folder refreshes in Finder and I
can see everything. However, dismount and remount and everything is gone
again (although sometimes it displays all files for a few seconds before
most of them disappear). I've repeated this on three different Macs of
varying origin and OS version.


I've started Wireshark on my Mac and monitored what is happening. It appears
that there is an initial NFS READDIR Call to the NFS server with cookie set
to 0. The READDIR Reply contains the filename of every file in the folder.
Then there is another READDIR call with cookie set to 4096, which happens to
be the last cookie listed in the previous reply. Curiously, the reply to
this call lists all the files that I *cannot* see in Finder. But doesn't
include the ones I can see. Then there are a whole lot of LOOKUP Calls while
it looks at all the files that I *can* see. Then it stops at the 24th file,
the last file I can see in Finder. It then issues another READDIR Call with
a Cookie of 680. The Reply is "NFS3ERR_BAD_COOKIE". Looking through the
previous replies, the only time that cookie was issued was in the FIRST
reply. And again, the file in question with that cookie number is the LAST
file that I can see in Finder.


Surely, Finder cannot be THIS broken? I can see all files in that folder
fine when I mount via AFS or SMB but not via NFS. But it all works fine from
Terminal. We're experimenting with updating Gluster to 3.7.8 and moving to
NFS Ganesha in the hope that moving to NFSv4 fixes it, but does anyone have
any idea what's happening? I'm happy to send the .pcapng file to someone if
it's helpful. I also have a .pcapng of when we create a file in the folder
and Finder refreshes to show everything in there. The only interesting thing
that I noticed in that file is that the cookie number at the end of the
READDIR is much larger than anything I was seeing in the failed listings
(17179869176). I tried forcing 32-bit inode sizes in Gluster NFS options
(the closest thing I could find to NFS's native 32-bit cookie size
restriction) with no joy, just in case that was part of it, which wouldn't
make sense but tried anyway and no difference.


Thanks in advance.



