[Gluster-devel] zero-copy readv
rgowdapp at redhat.com
Tue Jan 15 12:29:36 UTC 2013
----- Original Message -----
> From: "Anand Avati" <aavati at redhat.com>
> To: "Amar Tumballi" <atumball at redhat.com>
> Cc: bharata at linux.vnet.ibm.com, gluster-devel at nongnu.org, "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Sent: Thursday, January 10, 2013 12:20:09 PM
> Subject: Re: [Gluster-devel] zero-copy readv
> On 01/09/2013 10:37 PM, Amar Tumballi wrote:
> >> - On the read side things are a little more complicated. In
> >> rpc-transport/socket, there is a call to iobuf_get() to create a
> >> new
> >> iobuf for reading in the readv reply data from the server. We will
> >> need
> >> a framework changes where, if the readv request (of the xid for
> >> which
> >> readv reply is being handled) happened to be a "direct" variant
> >> (i.e,
> >> zero-copy), then the "special iobuf around user's memory" gets
> >> picked up
> >> and read() from socket is performed directly into user's memory.
> >> Similar, but equivalent, changes will have to be done in RDMA
> >> (Raghavendra on CC can help). Since the goal is to avoid memory
> >> copy,
> >> this data will be bypassing io-cache (and purging pre-cached data
> >> of
> >> those regions along the way).
> > On the read side too, our client protocol is designed to handle
> > 0-copy
> > already, ie, if the fop comes with an iobuf/iobref, then the same
> > buffer
> > is used for copying the received data from network.
> > (client_submit_request() is designed to handle this). 
> > We made all these changes to make RDMA 0-copy a possibility, so
> > even
> > RDMA transport should be already 0-copy friendly.
> > Thats my understanding.
> > Regards,
> > Amar
> >  - recent patches to handle RPC read-ahead may involve small
> > data
> > copy from header to data buffer, but surely not very high.
> Amar - note that the current infrastructure present for 0-copy RDMA
> might not be sufficient for GFAPI's 0-copy. A glfs_readv() request
> the app can come as a vector of memory pointers (and not a contiguous
> iobuf) and therefore require storing an iovec/count as well. This
> also mean we need to exercise the scatter-gather aspects of the verbs
If we pass user supplied vectors as write chunks to server, it will do rdma-writes to memory regions pointed by those vectors. So, I think there are no major changes required to rdma as well.
More information about the Gluster-devel