[Gluster-users] Passing noforget option to glusterfs native client mounts

Anirban Ghoshal chalcogen_eg_oxygen at yahoo.com
Tue Dec 24 16:21:23 UTC 2013


Hi, and Thanks a lot, Anand!

I was initially searching for a good answer to why the glusterfs site lists knfsd as NOT compatible with the glusterfs.  So, now I know. :)

Funnily enough, we didn't have a problem with the failover during our testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and NFS mounts on client applications haven't called any of the file handles out stale while migrating the NFS service from one server to the other. Not sure why this happpens. Do nodeid's and generation numbers remain invariant across storage servers in glusterfs-3.4.0? 


We, for our part, have a pretty small amount of data in our filesystem (that is, compared with the petabyte sized volumes glusterfs commonly manages). Our total volume size would be somewhere around 4 GB, and some 50, 000 files is all they contain. Each server has around 16 GB of RAM, so space is not at a premium for this project... 

However, saying that, if glusterfs NFS server does maintain identical file handles across all its servers and does not alter file-handles upon failover, then in the long run it might be prudent to switch to glusterFS NFS as the cleaner solution... 


Thanks again!
Anirban




On Tuesday, 24 December 2013 1:58 PM, Anand Avati <avati at gluster.org> wrote:
 
Hi,
Allowing noforget option to FUSE will not help for your cause. Gluster persents the address of the inode_t as the nodeid to FUSE. In turn FUSE creates a filehandle using this nodeid for knfsd to export to nfs client. When knfsd fails over to another server, FUSE will decode the handle encoded by the other NFS server and try to use the nodeid of the other server - which will obviously not work as the virtual address of glusterfs process on the other server is not valid here.

Short version: the file-handle generated through FUSE is not durable. The "noforget" option in FUSE is a hack to avoid ESTALE messages because of dcache pruning. If you have enough inode in your volume, your system will go OOM at some point. The "noforget" is NOT a solution for providing NFS failover to a different server.

For reasons such as these, we ended up implementing our own NFS server where we encode a filehandle using the GFID (which is durable across reboots and server failovers). I would strongly recommend NOT using knfsd with any FUSE based filesystems (not just glusterfs) for a serious production use, and it will just not work if you are designing for NFS high availability/fail-over.

Thanks,
Avati



On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal <chalcogen_eg_oxygen at yahoo.com> wrote:

If somebody has an idea on how this could be done, could you please help out? I am still stuck on this, apparently...
>
>Thanks,
>Anirban
>
>
>
>
>On Thursday, 19 December 2013 1:40 AM, Chalcogen <chalcogen_eg_oxygen at yahoo.com> wrote:
> 
>P.s. I think I need to clarify this:
>
>I am only reading from the mounts, and not modifying anything on the
    server. and so the commonest causes on stale file handles do not
    appy.
>
>Anirban
>
>
>On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:
>
>Hi everybody,
>
>A few months back I joined a project where people want to
        replace their legacy fuse-based (twin-server) replicated
        file-system with GlusterFS. They also have a high-availability
        NFS server code tagged with the kernel NFSD that they would wish
        to retain (the nfs-kernel-server, I mean). The reason they wish
        to retain the kernel NFS and not use the NFS server that comes
        with GlusterFS is mainly because there's this bit of code that
        allows NFS IP's to be migrated from one host server to the other
        in the case that one happens to go down, and tweaks on the
        export server configuration allow the file-handles to remain
        identical on the new host server.
>
>The solution was to mount gluster volumes using the
        mount.glusterfs native client program and then export the
        directories over the kernel NFS server. This seems to work most
        of the time, but on rare occasions, 'stale file handle' is
        reported off certain clients, which really puts a damper over
        the 'high-availability' thing. After suitably instrumenting the
        nfsd/fuse code in the kernel, it seems that decoding of the
        file-handle fails on the server because the inode record
        corresponding to the nodeid in the handle cannot be looked up.
        Combining this with the fact that a second attempt by the client
        to execute lookup on the same file passes, one might suspect
        that the problem is identical to what many people attempting to
        export fuse mounts over the kernel's NFS server are facing; viz,
        fuse 'forgets' the inode records thereby causing ilookup5() to
        fail. Miklos and other fuse developers/hackers would point
        towards '-o noforget' while mounting their fuse file-systems. 
>
>I tried passing  '-o noforget' to mount.glusterfs, but it does
        not seem to recognize it. Could somebody help me out with the
        correct syntax to pass noforget to gluster volumes? Or,
        something we could pass to glusterfs that would instruct fuse to
        allocate a bigger cache for our inodes?
>
>Additionally, should you think that something else might be
        behind our problems, please do let me know.
>
>Here's my configuration:
>
>Linux kernel version: 2.6.34.12
>GlusterFS versionn: 3.4.0
>nfs.disable option for volumes: OFF on all volumes
>
>Thanks a lot for your time!
>Anirban
>
>P.s. I found quite a few pages on the web that admonish users
        that GlusterFS is not compatible with the kernel NFS server, but
        do not really give much detail. Is this one of the reasons for
        saying so?
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131225/1b08c4cc/attachment.html>


More information about the Gluster-users mailing list