[Gluster-devel] using GlusterFS to build an NFSv4.1 pNFS server

Wed Jun 3 13:27:16 UTC 2015

On 03/06/15 04:41, Niels de Vos wrote:
> On Tue, Jun 02, 2015 at 06:18:54PM -0400, Rick Macklem wrote:
>> Jiffin Tony Thottan wrote:
>>> Hi Rick,
>>>
>>> There is already support for pNFS in gluster volumes using
>>> nfs-ganesha :
>>> http://gluster.readthedocs.org/en/latest/Features/mount_gluster_volume_using_pnfs/
>>> It supports normal FILE_LAYOUT architecture.
>> Yes, I am aware of this (although I'll admit I noticed it in the docs after I
>> posted the email).
>>
>> Just fyi, if I wanted to set up a (near) production NFSv4.1/pNFS server, this would be
>> fine, but that's not me;-)
>> I'm interested in extending the NFSv4.1 server I've already written to do
>> pNFS. Why? Well, mostly because it interests me. (I've never been paid any $$
>> to do any of the FreeBSD NFS work I've done, so I pretty much do it as a hobby.)

>> If the result never works or never performs well enough to be useful for
>> production environments then...oh well, it was an interesting experiment.
> Definitely sounds interesting! I don't have much to do with FreeBSD, but
> I'm certainly happy to help on the Gluster side if you have any
> questions.

+1. Also I can  help  you with pNFS related queries

>> If it ever is useful for (near) production environments, I suspect it would be
>> users that have set up a FreeBSD NFS server and it is outgrowing what a single
>> server can handle. In other words, they would come from the FreeBSD NFS server
>> side and not the GlusterFS side.
>>> Other comments are inline
>>>
>>> On 02/06/15 05:18, Rick Macklem wrote:
>>>> Hi,
>>>>
>>>> Btw, I do most of the FreeBSD NFSv4 work.
>>>> I am interested in trying to use GlusterFS
>>>> to build a FreeBSD NFSv4.1 pNFS server.
>>>> My hope is that, by directing the NFSv4.1 client
>>>> to the host where the file resides, the client will
>>>> be able to do I/O on it efficiently via the NFSv3
>>>> server. (The new layout type called flex files allows
>>>> an NFSv3 server to be a storage/data server for pNFS.)
>>> It will be good to use gluster-nfs  as a data-server(which is more
>>> tightly coupled with bricks)
>>> CCing Anand who has better idea about flex file layout architecture
>>>
>> Flex file is pretty straightforward. It simply allows the NFSv3 server
>> to be what they call a storage server. All that it does is use a "fake"
>> uid/gid that is allowed rw/ro access to the file. (This implies that
>> the client is responsible for deciding if a user is allowed access to
>> the file. Not a big deal for AUTH_SYS, since the server "trusts" the
>> client's choice of uid/gid anyhow.)
>> --> As such, the NFSv3 server needs to have a small change applied to
>>      it to allow access via this "fake" uid/gid.
> This sounds simple enough to do. File a feature request and describe how
> you can use this. Patches are welcome too, of course, but we can likely
> code something up quickly.
>
>      https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=nfs
>
>> Basically, the NFSv4.1 server needs to know what the NFSv3 server's
>> host IP address is and what FH to use for the file on it. (I do see
>> the code in the NFS xlator for generating an FH, but haven't looked
>> much yet.) As noted below in the original post.
> The FH in Gluster/NFS is based on the volume-id and the GFID. Both are
> UUIDs. The volume-id is a unique identifier for the volume, and the GFID
> is like a volume-wide inode-nr (volumes consist out of multiple bricks
> with their own filesystems, a storage server can host multiple bricks).

It is not required to create FH in MDS(which might not be consistent in 
other gluster-nfs-server),
Instead create ds_wire(for me it was combination of GFID and IP of the 
server) and handle will created at each
data server based on the ds_wire for the I/O's

> There is no way to know which brick should handle a FH. Looking for the
> GFID on all the bricks that participate in the volume is a rather
> expensive operation (many LOOKUPs). You will always need to find the
> location of the file with a request through FUSE.
>
>>>> To do this, I need to be able to "poke" the
>>>> glusterfs server and get the following information:
>>>> - The NFSv3 file handle and the IP address for
>>>>     the host(s) the file lives on.
>>>>     --> Using this, I am planning on creating a layout
>>>>         that tells the NFSv4.1 client to use NFSv3 to
>>>>         do I/O on the file. (What NFSv4.1 calls a storage
>>>>         server, although some RFCs might call it a data
>>>>         server.)
>>>> - I hope to use the fuse interface for the NFSv4.1 metadata
>>>>     server.
>>> I don't know how much it is feasible to implement meta data server
>>> using
>>> a fuse interface.
>>>
>> I guess I'll find out;-). The FreeBSD NFSv4.1 server is kernel based
>> and exports any local file system that has a VFS/VOP interface. So,
>> hopefully FUSE won't provide too many surprises.
>> I am curious to see how well it performs.
> I have no idea how FreeBSD handles FUSE, but I'm sure you won't have an
> issue with figuring that out. You should be able to get the details
> about the location of the file through GETXATTR calls. In NFS-Ganesha,
> these two functions parse the output:
>   - get_pathinfo_host
>   - glfs_get_ds_addr
>
>      These can be found here:
>      https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/FSAL/FSAL_GLUSTER/mds.c#L482
>
>
>>>> If anyone can point me to the area in the GlusterFS sources
>>>> that I should look at to do this and/or suggest a machanism
>>>> for getting the above information out of the GlusterFS server,
>>>> please let me know.
>>>>
>>>> Also, any comments w.r.t. the above plan are welcome.
>>> In my opinion, a hybrid approach will better. Use the current meta
>>> data
>>> server implemented in ganesha (support for flex files is already
>>> added
>>> in ganesha)
>>> and might need to have some tweaks in write, read, commit api's of
>>> gluster-nfs. In this implementation, we should keep away
>>> meta-data-server from
>>> trusted storage pool(T.S.P) i.e a dedicated server is required for
>>> M.D.S
>>>
>> I think I answered this above. Also, I doubt ganesha-nfs is ported to
>> FreeBSD.

I am not sure about this , may be folks from nfs-ganesha community can 
help you with that
You can either send a mail to nfs-ganesha-devel list or ping them in irc 
at #ganesha in freenode.

>> Thanks for your comments, rick
>> ps: Given ganesha-nfs etc, I'll understand if GlusterFS isn't interested
>>      in this. Any patches that I'll generate are a long way off anyhow.
> Our path forward for a more recent version and current feature set for
> NFS is based on NFS-Ganesha. But, there are many users of Gluster/NFS
> (NFSv3 only) that would not like to see our NFS-server disappear. If
> Gluster/NFS can help you with providing a FreeBSD pNFS server, we would
> surely have some interest. It will not be on the top of our planning,
> but we should try to assist you where we can.
>
> Thanks for sharing your ideas, please keep us informed and let us know
> where you hit issues related to Gluster.
>
> Niels
>
>
>>>> Thanks in advance for any information, rick
>>>> ps: I haven't written any code yet, but I think the above
>>>>       might be feasible.
>>> You are mostly welcome in coding part :).
>>>
>>> If you face any issue to implement current pNFS server for gluster
>>> volumes , please feel free to enquire about the same.
>>>
>>> Regards,
>>> Jiffin
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel