[Gluster-devel] zero-copy readv

Anand Avati anand.avati at gmail.com
Tue Jan 15 15:33:37 UTC 2013


On Tue, Jan 15, 2013 at 4:29 AM, Raghavendra Gowdappa
<rgowdapp at redhat.com>wrote:

>
>
> ----- Original Message -----
> > From: "Anand Avati" <aavati at redhat.com>
> > To: "Amar Tumballi" <atumball at redhat.com>
> > Cc: bharata at linux.vnet.ibm.com, gluster-devel at nongnu.org, "Raghavendra
> Gowdappa" <rgowdapp at redhat.com>
> > Sent: Thursday, January 10, 2013 12:20:09 PM
> > Subject: Re: [Gluster-devel] zero-copy readv
> >
> > On 01/09/2013 10:37 PM, Amar Tumballi wrote:
> > >
> > >>
> > >> - On the read side things are a little more complicated. In
> > >> rpc-transport/socket, there is a call to iobuf_get() to create a
> > >> new
> > >> iobuf for reading in the readv reply data from the server. We will
> > >> need
> > >> a framework changes where, if the readv request (of the xid for
> > >> which
> > >> readv reply is being handled) happened to be a "direct" variant
> > >> (i.e,
> > >> zero-copy), then the "special iobuf around user's memory" gets
> > >> picked up
> > >> and read() from socket is performed directly into user's memory.
> > >> Similar, but equivalent, changes will have to be done in RDMA
> > >> (Raghavendra on CC can help). Since the goal is to avoid memory
> > >> copy,
> > >> this data will be bypassing io-cache (and purging pre-cached data
> > >> of
> > >> those regions along the way).
> > >>
> > >
> > > On the read side too, our client protocol is designed to handle
> > > 0-copy
> > > already, ie, if the fop comes with an iobuf/iobref, then the same
> > > buffer
> > > is used for copying the received data from network.
> > > (client_submit_request() is designed to handle this). [1]
> > >
> > > We made all these changes to make RDMA 0-copy a possibility, so
> > > even
> > > RDMA transport should be already 0-copy friendly.
> > >
> > > Thats my understanding.
> > >
> > > Regards,
> > > Amar
> > >
> > > [1] - recent patches to handle RPC read-ahead may involve small
> > > data
> > > copy from header to data buffer, but surely not very high.
> > >
> >
> > Amar - note that the current infrastructure present for 0-copy RDMA
> > might not be sufficient for GFAPI's 0-copy. A glfs_readv() request
> > from
> > the app can come as a vector of memory pointers (and not a contiguous
> > iobuf) and therefore require storing an iovec/count as well. This
> > might
> > also mean we need to exercise the scatter-gather aspects of the verbs
> > API.
>
> If we pass user supplied vectors as write chunks to server, it will do
> rdma-writes to memory regions pointed by those vectors. So, I think there
> are no major changes required to rdma as well.


I wasn't sure if the client-side interface b/w protocol/client and
rpc-transport/rdma was doing everything right even though the rdma
transport itself had the capability. I guess that is probably what you
mentioned as "If we pass user supplied vectors..".

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130115/3f152916/attachment-0001.html>


More information about the Gluster-devel mailing list