[Gluster-devel] using GlusterFS to build an NFSv4.1 pNFS server

Niels de Vos ndevos at redhat.com
Tue Jun 2 23:11:44 UTC 2015


On Tue, Jun 02, 2015 at 06:18:54PM -0400, Rick Macklem wrote:
> Jiffin Tony Thottan wrote:
> > 
> > Hi Rick,
> > 
> > There is already support for pNFS in gluster volumes using
> > nfs-ganesha :
> > http://gluster.readthedocs.org/en/latest/Features/mount_gluster_volume_using_pnfs/
> > It supports normal FILE_LAYOUT architecture.
> Yes, I am aware of this (although I'll admit I noticed it in the docs after I
> posted the email).
> 
> Just fyi, if I wanted to set up a (near) production NFSv4.1/pNFS server, this would be
> fine, but that's not me;-)
> I'm interested in extending the NFSv4.1 server I've already written to do
> pNFS. Why? Well, mostly because it interests me. (I've never been paid any $$
> to do any of the FreeBSD NFS work I've done, so I pretty much do it as a hobby.)
> If the result never works or never performs well enough to be useful for
> production environments then...oh well, it was an interesting experiment.

Definitely sounds interesting! I don't have much to do with FreeBSD, but
I'm certainly happy to help on the Gluster side if you have any
questions.

> If it ever is useful for (near) production environments, I suspect it would be
> users that have set up a FreeBSD NFS server and it is outgrowing what a single
> server can handle. In other words, they would come from the FreeBSD NFS server
> side and not the GlusterFS side.
> 
> > Other comments are inline
> > 
> > On 02/06/15 05:18, Rick Macklem wrote:
> > > Hi,
> > >
> > > Btw, I do most of the FreeBSD NFSv4 work.
> > > I am interested in trying to use GlusterFS
> > > to build a FreeBSD NFSv4.1 pNFS server.
> > > My hope is that, by directing the NFSv4.1 client
> > > to the host where the file resides, the client will
> > > be able to do I/O on it efficiently via the NFSv3
> > > server. (The new layout type called flex files allows
> > > an NFSv3 server to be a storage/data server for pNFS.)
> > 
> > It will be good to use gluster-nfs  as a data-server(which is more
> > tightly coupled with bricks)
> > CCing Anand who has better idea about flex file layout architecture
> > 
> Flex file is pretty straightforward. It simply allows the NFSv3 server
> to be what they call a storage server. All that it does is use a "fake"
> uid/gid that is allowed rw/ro access to the file. (This implies that
> the client is responsible for deciding if a user is allowed access to
> the file. Not a big deal for AUTH_SYS, since the server "trusts" the
> client's choice of uid/gid anyhow.)
> --> As such, the NFSv3 server needs to have a small change applied to
>     it to allow access via this "fake" uid/gid.

This sounds simple enough to do. File a feature request and describe how
you can use this. Patches are welcome too, of course, but we can likely
code something up quickly.

    https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=nfs

> Basically, the NFSv4.1 server needs to know what the NFSv3 server's
> host IP address is and what FH to use for the file on it. (I do see
> the code in the NFS xlator for generating an FH, but haven't looked
> much yet.) As noted below in the original post.

The FH in Gluster/NFS is based on the volume-id and the GFID. Both are
UUIDs. The volume-id is a unique identifier for the volume, and the GFID
is like a volume-wide inode-nr (volumes consist out of multiple bricks
with their own filesystems, a storage server can host multiple bricks).

There is no way to know which brick should handle a FH. Looking for the
GFID on all the bricks that participate in the volume is a rather
expensive operation (many LOOKUPs). You will always need to find the
location of the file with a request through FUSE.

> > > To do this, I need to be able to "poke" the
> > > glusterfs server and get the following information:
> > > - The NFSv3 file handle and the IP address for
> > >    the host(s) the file lives on.
> > >    --> Using this, I am planning on creating a layout
> > >        that tells the NFSv4.1 client to use NFSv3 to
> > >        do I/O on the file. (What NFSv4.1 calls a storage
> > >        server, although some RFCs might call it a data
> > >        server.)
> > > - I hope to use the fuse interface for the NFSv4.1 metadata
> > >    server.
> > 
> > I don't know how much it is feasible to implement meta data server
> > using
> > a fuse interface.
> > 
> I guess I'll find out;-). The FreeBSD NFSv4.1 server is kernel based
> and exports any local file system that has a VFS/VOP interface. So,
> hopefully FUSE won't provide too many surprises.
> I am curious to see how well it performs.

I have no idea how FreeBSD handles FUSE, but I'm sure you won't have an
issue with figuring that out. You should be able to get the details
about the location of the file through GETXATTR calls. In NFS-Ganesha,
these two functions parse the output:
 - get_pathinfo_host
 - glfs_get_ds_addr

    These can be found here:
    https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/FSAL/FSAL_GLUSTER/mds.c#L482


> > > If anyone can point me to the area in the GlusterFS sources
> > > that I should look at to do this and/or suggest a machanism
> > > for getting the above information out of the GlusterFS server,
> > > please let me know.
> > >
> > > Also, any comments w.r.t. the above plan are welcome.
> > 
> > In my opinion, a hybrid approach will better. Use the current meta
> > data
> > server implemented in ganesha (support for flex files is already
> > added
> > in ganesha)
> > and might need to have some tweaks in write, read, commit api's of
> > gluster-nfs. In this implementation, we should keep away
> > meta-data-server from
> > trusted storage pool(T.S.P) i.e a dedicated server is required for
> > M.D.S
> > 
> I think I answered this above. Also, I doubt ganesha-nfs is ported to
> FreeBSD.
> 
> Thanks for your comments, rick
> ps: Given ganesha-nfs etc, I'll understand if GlusterFS isn't interested
>     in this. Any patches that I'll generate are a long way off anyhow.

Our path forward for a more recent version and current feature set for
NFS is based on NFS-Ganesha. But, there are many users of Gluster/NFS
(NFSv3 only) that would not like to see our NFS-server disappear. If
Gluster/NFS can help you with providing a FreeBSD pNFS server, we would
surely have some interest. It will not be on the top of our planning,
but we should try to assist you where we can.

Thanks for sharing your ideas, please keep us informed and let us know
where you hit issues related to Gluster.

Niels


> > > Thanks in advance for any information, rick
> > > ps: I haven't written any code yet, but I think the above
> > >      might be feasible.
> > 
> > You are mostly welcome in coding part :).
> > 
> > If you face any issue to implement current pNFS server for gluster
> > volumes , please feel free to enquire about the same.
> > 
> > Regards,
> > Jiffin
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list