[Gluster-devel] Reduce memcpy in glfs read and write

Wed Jun 22 08:47:42 UTC 2016

Nice to see you :). Welcome again :).

----- Original Message -----
> From: "Mohammed Rafi K C" <rkavunga at redhat.com>
> To: "Sachin Pandit" <spandit at commvault.com>, "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: gluster-devel at gluster.org, "Ankireddypalle Reddy" <areddy at commvault.com>
> Sent: Wednesday, June 22, 2016 12:26:03 PM
> Subject: Re: [Gluster-devel] Reduce memcpy in glfs read and write
> 
> Hi Sachin,
> 
> Good to see you again in Gluster-devel.
> 
> I had implemented those API which I mentioned for a POC. In fact I couldn't
> push into master.
> 
> Regarding your questions, my comments are inline.
> 
> On 06/22/2016 05:50 AM, Sachin Pandit wrote:
> 
> 
> 
> 
> 
> Hey Pranith, I am good, I hope you are doing good too.
> 
> Please find the comments inline.
> 
> 
> 
> From: Pranith Kumar Karampuri [ mailto:pkarampu at redhat.com ]
> Sent: Tuesday, June 21, 2016 5:58 AM
> To: Sachin Pandit <spandit at commvault.com>
> Cc: gluster-devel at gluster.org
> Subject: Re: [Gluster-devel] Reduce memcpy in glfs read and write
> 
> 
> 
> 
> Hey!!
> 
> 
> Hope you are doing good. I took a look at the bt. So when flush comes
> write-behind has to flush all the writes down. I see the following frame
> hung in iob_unref:
> Thread 7 (Thread 0x7fa601a30700 (LWP 16218)):
> #0 0x00007fa60cc55225 in pthread_spin_lock () from /lib64/libpthread.so.0
> <<---- Does it always hang there?
> 
> ---------------------------------
> 
> >>It does always hang here.
> 
> ---------------------------------
> #1 0x00007fa60e1f373e in iobref_unref (iobref=0x19dc7e0) at iobuf.c:907
> #2 0x00007fa60e246fb2 in args_wipe (args=0x19e70ec) at default-args.c:1593
> #3 0x00007fa60e1ea534 in call_stub_wipe_args (stub=0x19e709c) at
> call-stub.c:2466
> #4 0x00007fa60e1ea5de in call_stub_destroy (stub=0x19e709c) at
> call-stub.c:2482
> 
> 
> Is this on top of master branch? It seems like we missed an unlock of the
> spin-lock or the iobref has junk value which gives the feeling that it is in
> locked state (May be double free?). Do you have any extra patches you have
> in your repo which make changes in iobuf?
> 
> ----------------------------------
> 
> >>I have implemented a method to reduce memcpy in libgfapi (My patch is on
> >>top of master branch), by making use of buffer from iobuf pool and passing
> >>the buffer to application. However, I have not made any changes in iobuf
> >>core feature. I don’t think double free is happening anywhere in the code
> >>(I did check this using logs)
> 
> 
> 
> Method that I have implemented:
> 
> 1) Application asks for a buffer of specific size, and the buffer is
> allocated from the iobuf pool.
> 
> 2) Buffer is passed on to application, and the application writes the data
> into that buffer.
> 
> 3) Buffer with data in it is passed from application to libgfapi and the
> underlying translators (no memcpy in glfs_write)
> 
> 
> 
> I have couple of questions, and observations:
> 
> 
> 
> Observations:
> 
> ------------------
> 
> 1) For every write if I get a fresh buffer then I don’t see any problem. All
> the writes are going through.
> 
> 2) If I try to make use of buffer for consecutive writes, then I am seeing
> the hang in flush.
> 
> 
> 
> Question1: Is it fine if I reuse the buffer for consecutive writes??
> 
> The answer is no, If IO-cache and write-behind are enabled. In case of
> IO-cache, It just take a ref on the buffer to store into the cache, which
> means the io-cache is still using the buffer,
> 
> In case of write-behind, if it decided to aggregate multiple write request ,
> in this case also it will take a ref on the iobuf and will lie to the
> appluca

That's correct. You cannot use the same iobuf for more than one write call. What you can do is:

1. Application requests an iobuf from gfapi. gfapi gets an iobuf (probably does an iobuf_ref).
2. Application does write using pub_glfs_writev with iov populated with memory belonging to iobuf. But current pub_glfs_writev doesn't take iobuf as an argument. So, you need to change that to take an iobuf/iobref as argument so that application doesn't free up write payload while it is still being cached in write-behind. In other words application should pass down the iobuf too along with data, to do buffer management (just like every translator pass down iobuf/iobref during STACK_WIND/STACK_UNWIND).
3. once pub_glfs_writev completes, application should do an iobuf_unref (or instruct gfapi to do so).

If we are doing the above three things for every write, I don't see any problem.

Similar things (though not exactly _same_ things) should be done for readv codepath too.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> Question2: Is it always ensured that the data is written to the file when I
> get a response from syncop_writev.
> 
> As I explained the write-behind may prevent this.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> Sachin Pandit.
> 
> 
> 
> 
> ----------------------------------
> 
> 
> 
> 
> On Tue, Jun 21, 2016 at 4:07 AM, Sachin Pandit < spandit at commvault.com >
> wrote:
> 
> 
> 
> 
> Hi all,
> 
> 
> 
> I bid adieu to you all with the hope of crossing path again, and the time has
> come rather quickly. It feels great to work on GlusterFS again.
> 
> 
> 
> Currently we are trying to write data backed up by Commvault Simpana to
> glusterfs volume (Disperse volume). To improve the performance, I have
> implemented the proposal put forward my Rafi K C [1]. I have some questions
> regarding libgfapi and iobuf pool.
> 
> 
> 
> To reduce an extra level of copy in glfs read and write, I have implemented
> few APIs to request a buffer (similar to the one represented in [1]) from
> iobuf pool which can be used by the application to write data to. With this
> implementation, when I try to reuse the buffer for consecutive writes, I
> could see a hang in syncop_flush of glfs_close (BT of the hang can be found
> in [2]). I wanted to know if reusing the buffer is recommended. If not, do
> we need to request buffer for each writes?
> 
> 
> 
> Setup : Distributed-Disperse ( 4 * (2+1)). Bricks scattered over 3 nodes.
> 
> 
> 
> [1] http://www.gluster.org/pipermail/gluster-devel/2015-February/043966.html
> 
> [2] Attached file - bt.txt
> 
> 
> 
> Thanks & Regards,
> 
> Sachin Pandit.
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> 
> --
> 
> 
> Pranith
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
> 
> 
> _______________________________________________
> Gluster-devel mailing list Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel