[Gluster-devel] Reduce memcpy in glfs read and write

Sachin Pandit spandit at commvault.com
Thu Jun 23 20:07:55 UTC 2016


I have submitted the code upstream. Can you please review that.
http://review.gluster.org/#/c/14784/

Thank you,
Sachin Pandit.

-----Original Message-----
From: Sachin Pandit 
Sent: Thursday, June 23, 2016 9:00 AM
To: 'Raghavendra Gowdappa' <rgowdapp at redhat.com>; Mohammed Rafi K C <rkavunga at redhat.com>; Pranith Kumar Karampuri <pkarampu at redhat.com>
Cc: gluster-devel at gluster.org; Ankireddypalle Reddy <areddy at commvault.com>
Subject: RE: [Gluster-devel] Reduce memcpy in glfs read and write

Thank you Raghavendra, Pranith and Rafi.

Raghavendra, I have implemented the similar mechanism as you suggested (I hope I did). The only thing which is different is, I have not changed the existing pub_glfs_writev, whereas I have introduced a new code path which does a similar thing using iobuf pool. As you said, the changes work fine. I was stuck at the part when I tried to reuse the buffer. After looking at Rafi's comment and your comment, now it is clear to me then I cannot reuse the buffer. I will send out the patch to master branch very shortly.

Thanks,
Sachin Pandit.

-----Original Message-----
From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
Sent: Wednesday, June 22, 2016 1:48 AM
To: Mohammed Rafi K C <rkavunga at redhat.com>
Cc: Sachin Pandit <spandit at commvault.com>; Pranith Kumar Karampuri <pkarampu at redhat.com>; gluster-devel at gluster.org; Ankireddypalle Reddy <areddy at commvault.com>
Subject: Re: [Gluster-devel] Reduce memcpy in glfs read and write

Nice to see you :). Welcome again :).

----- Original Message -----
> From: "Mohammed Rafi K C" <rkavunga at redhat.com>
> To: "Sachin Pandit" <spandit at commvault.com>, "Pranith Kumar Karampuri" 
> <pkarampu at redhat.com>
> Cc: gluster-devel at gluster.org, "Ankireddypalle Reddy" 
> <areddy at commvault.com>
> Sent: Wednesday, June 22, 2016 12:26:03 PM
> Subject: Re: [Gluster-devel] Reduce memcpy in glfs read and write
> 
> Hi Sachin,
> 
> Good to see you again in Gluster-devel.
> 
> I had implemented those API which I mentioned for a POC. In fact I 
> couldn't push into master.
> 
> Regarding your questions, my comments are inline.
> 
> On 06/22/2016 05:50 AM, Sachin Pandit wrote:
> 
> 
> 
> 
> 
> Hey Pranith, I am good, I hope you are doing good too.
> 
> Please find the comments inline.
> 
> 
> 
> From: Pranith Kumar Karampuri [ mailto:pkarampu at redhat.com ]
> Sent: Tuesday, June 21, 2016 5:58 AM
> To: Sachin Pandit <spandit at commvault.com>
> Cc: gluster-devel at gluster.org
> Subject: Re: [Gluster-devel] Reduce memcpy in glfs read and write
> 
> 
> 
> 
> Hey!!
> 
> 
> Hope you are doing good. I took a look at the bt. So when flush comes 
> write-behind has to flush all the writes down. I see the following 
> frame hung in iob_unref:
> Thread 7 (Thread 0x7fa601a30700 (LWP 16218)):
> #0 0x00007fa60cc55225 in pthread_spin_lock () from
> /lib64/libpthread.so.0
> <<---- Does it always hang there?
> 
> ---------------------------------
> 
> >>It does always hang here.
> 
> ---------------------------------
> #1 0x00007fa60e1f373e in iobref_unref (iobref=0x19dc7e0) at
> iobuf.c:907
> #2 0x00007fa60e246fb2 in args_wipe (args=0x19e70ec) at
> default-args.c:1593
> #3 0x00007fa60e1ea534 in call_stub_wipe_args (stub=0x19e709c) at
> call-stub.c:2466
> #4 0x00007fa60e1ea5de in call_stub_destroy (stub=0x19e709c) at
> call-stub.c:2482
> 
> 
> Is this on top of master branch? It seems like we missed an unlock of 
> the spin-lock or the iobref has junk value which gives the feeling 
> that it is in locked state (May be double free?). Do you have any 
> extra patches you have in your repo which make changes in iobuf?
> 
> ----------------------------------
> 
> >>I have implemented a method to reduce memcpy in libgfapi (My patch 
> >>is on top of master branch), by making use of buffer from iobuf pool 
> >>and passing the buffer to application. However, I have not made any 
> >>changes in iobuf core feature. I don’t think double free is 
> >>happening anywhere in the code (I did check this using logs)
> 
> 
> 
> Method that I have implemented:
> 
> 1) Application asks for a buffer of specific size, and the buffer is 
> allocated from the iobuf pool.
> 
> 2) Buffer is passed on to application, and the application writes the 
> data into that buffer.
> 
> 3) Buffer with data in it is passed from application to libgfapi and 
> the underlying translators (no memcpy in glfs_write)
> 
> 
> 
> I have couple of questions, and observations:
> 
> 
> 
> Observations:
> 
> ------------------
> 
> 1) For every write if I get a fresh buffer then I don’t see any 
> problem. All the writes are going through.
> 
> 2) If I try to make use of buffer for consecutive writes, then I am 
> seeing the hang in flush.
> 
> 
> 
> Question1: Is it fine if I reuse the buffer for consecutive writes??
> 
> The answer is no, If IO-cache and write-behind are enabled. In case of 
> IO-cache, It just take a ref on the buffer to store into the cache, 
> which means the io-cache is still using the buffer,
> 
> In case of write-behind, if it decided to aggregate multiple write 
> request , in this case also it will take a ref on the iobuf and will 
> lie to the appluca

That's correct. You cannot use the same iobuf for more than one write call. What you can do is:

1. Application requests an iobuf from gfapi. gfapi gets an iobuf (probably does an iobuf_ref).
2. Application does write using pub_glfs_writev with iov populated with memory belonging to iobuf. But current pub_glfs_writev doesn't take iobuf as an argument. So, you need to change that to take an iobuf/iobref as argument so that application doesn't free up write payload while it is still being cached in write-behind. In other words application should pass down the iobuf too along with data, to do buffer management (just like every translator pass down iobuf/iobref during STACK_WIND/STACK_UNWIND).
3. once pub_glfs_writev completes, application should do an iobuf_unref (or instruct gfapi to do so).

If we are doing the above three things for every write, I don't see any problem.

Similar things (though not exactly _same_ things) should be done for readv codepath too.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> Question2: Is it always ensured that the data is written to the file 
> when I get a response from syncop_writev.
> 
> As I explained the write-behind may prevent this.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> Sachin Pandit.
> 
> 
> 
> 
> ----------------------------------
> 
> 
> 
> 
> On Tue, Jun 21, 2016 at 4:07 AM, Sachin Pandit < spandit at commvault.com
> >
> wrote:
> 
> 
> 
> 
> Hi all,
> 
> 
> 
> I bid adieu to you all with the hope of crossing path again, and the 
> time has come rather quickly. It feels great to work on GlusterFS again.
> 
> 
> 
> Currently we are trying to write data backed up by Commvault Simpana 
> to glusterfs volume (Disperse volume). To improve the performance, I 
> have implemented the proposal put forward my Rafi K C [1]. I have some 
> questions regarding libgfapi and iobuf pool.
> 
> 
> 
> To reduce an extra level of copy in glfs read and write, I have 
> implemented few APIs to request a buffer (similar to the one 
> represented in [1]) from iobuf pool which can be used by the 
> application to write data to. With this implementation, when I try to 
> reuse the buffer for consecutive writes, I could see a hang in 
> syncop_flush of glfs_close (BT of the hang can be found in [2]). I 
> wanted to know if reusing the buffer is recommended. If not, do we need to request buffer for each writes?
> 
> 
> 
> Setup : Distributed-Disperse ( 4 * (2+1)). Bricks scattered over 3 nodes.
> 
> 
> 
> [1]
> http://www.gluster.org/pipermail/gluster-devel/2015-February/043966.ht
> ml
> 
> [2] Attached file - bt.txt
> 
> 
> 
> Thanks & Regards,
> 
> Sachin Pandit.
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply 
> email and delete the message. Thank you."
> **********************************************************************
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> 
> --
> 
> 
> Pranith
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply 
> email and delete the message. Thank you."
> **********************************************************************
> 
> 
> _______________________________________________
> Gluster-devel mailing list Gluster-devel at gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************


More information about the Gluster-devel mailing list