[Gluster-devel] What functionality is expected from persistent NFS-client tracking?

Thu Jan 31 09:20:27 UTC 2013

On Wed, Jan 30, 2013 at 03:09:38PM -0500, J. Bruce Fields wrote:
> On Wed, Jan 30, 2013 at 02:31:09PM -0500, bfields wrote:
> > On Fri, Jan 25, 2013 at 03:23:28PM +0100, Niels de Vos wrote:
> > > Hi all,
> > > 
> > > the last few days I have been looking into making the tracking of 
> > > NFS-clients more persistent. As it is today, the NFS-clients are kept in 
> > > a list in memory on the NFS-server. When the NFS-server restarts, the 
> > > list is recreated from scratch and does not contain the NFS-clients that 
> > > still have the export mounted (Bug 904065).
> > > 
> > > NFSv3 depends on the MOUNT protocol. When an NFS-client mounts an 
> > > export, the MOUNT protocol is used to get the initial file-handle. With 
> > > this handle, the NFS-service can be contacted. The actual services 
> > > providing the MOUNT and NFSv3 protocol can be separate (Linux kernel 
> > > NFSd) or implemented closely together (Gluster NFS-server). 
> > > 
> > > Now, when the Linux kernel NFS-server is used, the NFS-clients are saved 
> > > my the rpc.mountd process (which handles the MOUNT protocol) in 
> > > /var/lib/nfs/rwtab. This file is modified on mounting and unmounting.  
> > > Implementing a persistent cache similar to this is pretty straight 
> > > forward and is available for testing and review in [1].
> > > 
> > > There are however some use-cases that may require some different 
> > > handling. When an NFS-server starts to mount an export, the MOUNT 
> > > protocol is handled on a specific server. After getting the initial 
> > > file-handle for the export, any Gluster NFS-server can be used to talk 
> > > NFSv3 and do I/O. When the NFS-clients are kept only on the NFS-server 
> > > that handled the initial MOUNT request, and due to fail-over (think CTDB 
> > > and similar here) an other NFS-server is used, the persistent cache of 
> > > 'connected' NFS-clients is inaccurate.
> > > 
> > > The easiest way I can think of to remedy this issue, is to place the 
> > > persistent NFS-client cache on a GlusterFS volume. When CTDB is used, 
> > > the locking-file and is placed on a shared storage as well, so the same 
> 
> This is the statd data?  That's the more important thing to get right.

Uhm, no. The locking-file I meant is for CTDB itself (I think). From my 
understanding the statd/NFS-locking is done through the GlusterFS-client 
(the NFS-server is a client, just like a FUSE-mount). For all I know the 
statd/NFS-locking is working as it should.

> > > volume can be used for the NFS-client cache. Providing an option 
> > > to set the volume/path of the NFS-client cache would be needed for 
> > > this.  I guess that this could result in a chicken-and-egg 
> > > situation (NFS-server is started, but no volume mounted yet)?
> 
> I don't think there should be any problem here: the exported filesystems
> need to be available before the server starts anyway.  (Otherwise the
> only response the server could give to operations on filehandles would
> be ESTALE.)

Well, the NFS-server dynamically gets exports (GlusterFS volumes) added 
when these are started or newly created. There is no hard requirement 
that a specific volume is available for the NFS-server to place a shared 
files with a list of NFS-clients. Probably easily solved by making the 
path to the file configurable and only accessing it when needed (and not 
at startup of the NFS-server).

> 
> --b.
> 
> > > 
> > > Any ideas or recommendations are welcome. The patch in [1] is not final 
> > > yet and I'd like some feedback before I proceed any further.
> > 
> > My only comment is that this doesn't need to be perfect or even all that
> > good.
> > 
> > The list can already get out of sync in other ways: clients can just
> > fail to unmount, for example.
> > 
> > I don't think it's used by anything other than showmount.
> 

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services