[Gluster-devel] gfapi zero copy write enhancement

Thu Aug 25 06:46:20 UTC 2016

Hi,

On 08/25/2016 12:58 AM, Shyam wrote:
> Hi,
>
> I was attempting to review this [1] change, and for a long time I 
> wanted to understand why we need this and what is the manner in which 
> we need to achieve the same. As there is a lack of understanding on my 
> part I am starting with some questions.
>
> 1) In writev FOP what is the role/use of the iobref parameter?
> - I do not see the posix xlator using this
> - The payload is carried in vector, rather than in the iobref
> - Across the code, other than protocol/client that (sorf of) 
> serializes this, I do not see it having any use
>
> So what am I missing?
>
> 2) Coming to the current change, what prevents us from doing this as [2]?
> - in short just pass in the buffer received as a part of iovec
>
> [2] is not necessarily clean, just a hack, that assumes there is a 
> iovcount of 1 always, and I just tested this with a write call, and 
> not a writev call. (just stating before we start a code review of the 
> same ;) )
>
> 3) From discussions in [3] and [4] I understand that this is to 
> eliminate copy when working with RDMA. Further, Avati's response to 
> the thought, discusses why we need to leave the memory management of 
> read/write buffers to the applications than use/reuse gluster buffers.
>
> So, in the long term if this is for RDMA, what is the change in 
> justification for the manner in which we are asking applications to 
> use gluster buffers, than doing it the other way?
>
> 4) Why should applications not reuse buffers? and instead ask for 
> fresh/new buffers for each write call?
Reason is: The buffer might be ref'ed in some translator like io-cache 
and write-behind.

Discussion in patch:
------------------------------------------------------------------------------
<< IMPORTANT: Buffer should not be reused across the zero copy write 
operation. Is this still valid, given that application allocates and 
free the buffer ? =============================
Yes this is still valid, if application tries to reuse the buffer then 
it might see a hang.
The reason being, the buffer might be ref'ed in some translator like 
io-cache and write-behind.
------------------------------------------------------------------------------

>
> 5) What is the performance gain noticed with this approach? As the 
> thread that (re)started this is [5]. In Commvault Simpana,
> - What were the perf gains due to this approach?
> - How does the application use the write buffers?
>   - Meaning, the buffer that is received from Gluster is used to 
> populate data from the network? I am curious as to how this 
> application uses these buffers, and where does data get copied into 
> these buffers from.
>
(slightly offtopic to this question)
Sometimes, Performance gained may not be in terms of read/write 
rates..but in terms of free CPU.
Just to give an example: With copy, CPU occupancy is 70%
                                          Without copy CPU occupancy is 40%

But, Sachin can share the results.

> Eliminating a copy in glfs_write seems trivial (if my hack and answers 
> to the first question are as per my assumptions), I am wondering what 
> we are attempting here, or what I am missing.
 From what I am understood,  there is a layer of separation between 
libgfapi and gluster.

Gluster plays with the buffer with whatever way it likes(read different 
translators) and hence  allocation and freeing should happen from Gluster.
Otherwise, if application needs to have control over buffer, there is a 
copy involved (at gluster layer).

Thanks,
Saravana