[Gluster-devel] libgfapi zero copy write - application in samba, nfs-ganesha

Raghavendra G raghavendra at gluster.com
Fri Sep 30 04:09:38 UTC 2016

On Thu, Sep 29, 2016 at 11:11 AM, Raghavendra G <raghavendra at gluster.com>

> On Wed, Sep 28, 2016 at 7:37 PM, Shyam <srangana at redhat.com> wrote:
>> On 09/27/2016 04:02 AM, Poornima Gurusiddaiah wrote:
>>> W.r.t Samba consuming this, it requires a great deal of code change in
>>> Samba.
>>> Currently samba has no concept of getting buf from the underlying file
>>> system,
>>> the filesystem comes into picture only at the last layer(gluster plugin),
>>> where system calls are replaced by libgfapi calls. Hence, this is not
>>> readily
>>> consumable by Samba, and i think same will be the case with NFS_Ganesha,
>>> will
>>> let the Ganesha folksc comment on the same.
>> This is exactly my reservation about the nature of change [2] that is
>> done in this patch. We expect all consumers to use *our* buffer management
>> system, which may not be possible all the time.
>> From the majority of consumers that I know of, other than what Sachin
>> stated as an advantage for CommVault, none of the others can use the
>> gluster buffers at the moment (Ganesha, SAMBA, qemu. (I would like to
>> understand how CommVault can use gluster buffers in this situation without
>> copying out data to the same, just for clarity).
> +Jeff cody, for comments on QEMU
>> This is the reason I posted the comments at [1], stating we should copy
>> out the buffer, when Gluster needs it preserved, but use application
>> provided buffers as long as we can.
> My concerns here are:
> * We are just moving the copy from gfapi layer to write-behind. Though I
> am not sure what percentage of writes that hit write-behind are
> "written-back", I would assume it to be a significant percentage (otherwise
> there is no benefit in having write-behind). However, we can try this
> approach and get some perf data before we make a decision.
> * Buffer management. All gluster code uses iobuf/iobrefs to manage the
> buffers of relatively large size. With the approach suggested above, I see
> two concerns:
>     a. write-behind has to differentiate between iobufs that need copying
> (write calls through gfapi layer) and iobufs that can just be refed (writes
> from fuse etc) when "writing-back" the write. This adds more complexity.
>     b. For the case where write-behind chooses to not "write-back" the
> write, we need a way of encapsulating the application buffer into
> iobuf/iobref. This might need changes in iobuf infra.
>> I do see the advantages of zero-copy, but not when gluster api is
>> managing the buffers, it just makes it more tedious for applications to use
>> this scheme, IMHO.
Another point we can consider here is gfapi (and gluster internal xlator
stack) providing both behaviors as mentioned below:
1. Making Glusterfs xlator stack use application buffers.
2. Forcing applications to use only gluster managed buffers if they want
zero copy.

Let the applications make choice on what interface to use, based on their
use-cases (as there is a trade-off in terms of performance, code changes,
legacy applications which are resistant to change etc).

>> Could we think and negate (if possible) thoughts around using the
>> application passed buffers as is? One caveat here seems to be when using
>> RDMA (we need the memory registered if I am not wrong), as that would
>> involve a copy to RDMA buffers when using application passed buffers.
> Actually RDMA is not a problem in the current implementation (ruling out
> suggestions by others to use a pre-registered iobufs  for managing io-cache
> etc). This is because, in current implementation the responsibility of
> registering the memory region lies in transport/rdma. In other words
> transport/rdma doesn't expect pre-registered buffers.
> What are the other pitfalls?
>> [1] http://www.gluster.org/pipermail/gluster-devel/2016-August/0
>> 50622.html
>> [2] http://review.gluster.org/#/c/14784/
>>> Regards,
>>> Poornima
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
> --
> Raghavendra G

Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160930/d68db2d3/attachment-0001.html>

More information about the Gluster-devel mailing list