[Gluster-devel] What functionality is expected from persistent NFS-client tracking?
J. Bruce Fields
bfields at fieldses.org
Thu Jan 31 20:19:28 UTC 2013
On Thu, Jan 31, 2013 at 10:20:27AM +0100, Niels de Vos wrote:
> On Wed, Jan 30, 2013 at 03:09:38PM -0500, J. Bruce Fields wrote:
> > On Wed, Jan 30, 2013 at 02:31:09PM -0500, bfields wrote:
> > > On Fri, Jan 25, 2013 at 03:23:28PM +0100, Niels de Vos wrote:
> > > > Hi all,
> > > >
> > > > the last few days I have been looking into making the tracking of
> > > > NFS-clients more persistent. As it is today, the NFS-clients are kept in
> > > > a list in memory on the NFS-server. When the NFS-server restarts, the
> > > > list is recreated from scratch and does not contain the NFS-clients that
> > > > still have the export mounted (Bug 904065).
> > > >
> > > > NFSv3 depends on the MOUNT protocol. When an NFS-client mounts an
> > > > export, the MOUNT protocol is used to get the initial file-handle. With
> > > > this handle, the NFS-service can be contacted. The actual services
> > > > providing the MOUNT and NFSv3 protocol can be separate (Linux kernel
> > > > NFSd) or implemented closely together (Gluster NFS-server).
> > > >
> > > > Now, when the Linux kernel NFS-server is used, the NFS-clients are saved
> > > > my the rpc.mountd process (which handles the MOUNT protocol) in
> > > > /var/lib/nfs/rwtab. This file is modified on mounting and unmounting.
> > > > Implementing a persistent cache similar to this is pretty straight
> > > > forward and is available for testing and review in .
> > > >
> > > > There are however some use-cases that may require some different
> > > > handling. When an NFS-server starts to mount an export, the MOUNT
> > > > protocol is handled on a specific server. After getting the initial
> > > > file-handle for the export, any Gluster NFS-server can be used to talk
> > > > NFSv3 and do I/O. When the NFS-clients are kept only on the NFS-server
> > > > that handled the initial MOUNT request, and due to fail-over (think CTDB
> > > > and similar here) an other NFS-server is used, the persistent cache of
> > > > 'connected' NFS-clients is inaccurate.
> > > >
> > > > The easiest way I can think of to remedy this issue, is to place the
> > > > persistent NFS-client cache on a GlusterFS volume. When CTDB is used,
> > > > the locking-file and is placed on a shared storage as well, so the same
> > This is the statd data? That's the more important thing to get right.
> Uhm, no. The locking-file I meant is for CTDB itself (I think). From my
> understanding the statd/NFS-locking is done through the GlusterFS-client
> (the NFS-server is a client, just like a FUSE-mount). For all I know the
> statd/NFS-locking is working as it should.
Oh, OK. Looking at the code in xlators/nfs/server/src/nlm4.c.... Looks
like it's probably just using the same statd as the kernel server--the
one installed as a part of nfs-utils, which by default puts its state in
So if you want failover to work, then the contents of
/var/lib/nfs/statd/ has to be made available to the server that takes
Anyway, agreed that putting that (and the nfs client list) on some
shared storage makes the most sense.
> > > > volume can be used for the NFS-client cache. Providing an option
> > > > to set the volume/path of the NFS-client cache would be needed for
> > > > this. I guess that this could result in a chicken-and-egg
> > > > situation (NFS-server is started, but no volume mounted yet)?
> > I don't think there should be any problem here: the exported filesystems
> > need to be available before the server starts anyway. (Otherwise the
> > only response the server could give to operations on filehandles would
> > be ESTALE.)
> Well, the NFS-server dynamically gets exports (GlusterFS volumes) added
> when these are started or newly created. There is no hard requirement
> that a specific volume is available for the NFS-server to place a shared
> files with a list of NFS-clients.
I'm not sure what you mean by "there is not hard requirement ...".
Surely it's a requirement that an NFS server have available at startup,
at a minimum:
- all exported volumes
- whichever volume contains /var/lib/nfs/statd/, if that's on
otherwise reboot recovery won't work. (And failover definitely won't
> Probably easily solved by making the
> path to the file configurable and only accessing it when needed (and not
> at startup of the NFS-server).
More information about the Gluster-devel