[Gluster-devel] 答复: Re: Gluster AFR volume write performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev

Thu Jan 28 13:40:30 UTC 2016

Sorry for the late reply.

Pranith Kumar Karampuri <pkarampu at redhat.com> 写于 2016/01/25 17:48:06:

> From: Pranith Kumar Karampuri <pkarampu at redhat.com>
> To: li.ping288 at zte.com.cn, 
> Cc: li.yi79 at zte.com.cn, zhou.shigang37 at zte.com.cn, 
> Liu.Jianjun3 at zte.com.cn, yang.bin18 at zte.com.cn
> Date: 2016/01/25 17:48
> Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write 
> performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND
> in afr_writev
> 
> 

> On 01/25/2016 03:09 PM, li.ping288 at zte.com.cn wrote:
> Hi Pranith, 
> 
> I'd be willing to have a chance to do my contribution to open-source. 
> It's my first time to deliver a patch for GlusterFS, hence I'm not 
> quite familiar with the code review and submitting procedures. 
> 
> I'll try to make it ASAP. By the way is there any guidelines to do this 
work?
> http://www.gluster.org/community/documentation/index.php/
> Simplified_dev_workflow may be helpful. Feel free to ask any doubt 
> you may have.
> 
> How do you guys use glusterfs?
> 
> Pranith

Thanks for your warm tips.  We currently use glusterfs to build the shared 
storage for distributed cluster nodes.

Here are the solutions I pondered over these days:

1，Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications.  because 
this optimization only play a part for appending write fops, 
     but most of the time of writing it is not kind of this. Hence I think 
it is not worth to do an optimization for the low probability situation 
     at cost of the vast majority of AFR writing performance drop. 
2，Revising the fixed GLUSTERFS_WRITE_IS_APPEND dictionary option in 
afr_writev in a dynamic way.  i.e. adding a new dynamic configurable
     option "write_is_append" just as the existing "ensure-durability" for 
AFR.  It could be configured on if AFR writing performance is not mainly 
     concerned and off if the performance is demanded.

I have been trying to find out a way in posix_writev to predict the 
appending write  in advance and then lock/unlock or not lock accordingly 
in the 
shortest and soonest, but I get no chance.

Anybody's other good ideas are appreciated.

Ping.Li

> 
> Thanks & Best Regards. 
> 
> Pranith Kumar Karampuri <pkarampu at redhat.com> 写于 2016/01/23 14:01:36:
> 
> > From: Pranith Kumar Karampuri <pkarampu at redhat.com> 
> > To: li.ping288 at zte.com.cn, gluster-devel at gluster.org, 
> > Cc: li.yi79 at zte.com.cn, Liu.Jianjun3 at zte.com.cn, 
> > zhou.shigang37 at zte.com.cn, yang.bin18 at zte.com.cn 
> > Date: 2016/01/23 14:02 
> > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write 
> > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND
> > in afr_writev 
> > 
> > 
> 
> > On 01/22/2016 07:14 AM, li.ping288 at zte.com.cn wrote: 
> > Hi Pranith, it is appreciated for your reply. 
> > 
> > Pranith Kumar Karampuri <pkarampu at redhat.com> 写于 2016/01/20 
18:51:19:
> > 
> > > 发件人:  Pranith Kumar Karampuri <pkarampu at redhat.com> 
> > > 收件人:  li.ping288 at zte.com.cn, gluster-devel at gluster.org, 
> > > 日期:  2016/01/20 18:51 
> > > 主题: Re: [Gluster-devel] Gluster AFR volume write performance has 
> > > been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev 
> > > 
> > > Sorry for the delay in response.
> > 
> > > On 01/15/2016 02:34 PM, li.ping288 at zte.com.cn wrote: 
> > > GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function at 
> > > glusterfs client end makes the posix_writev in the server end  deal 
> > > IO write fops from parallel  to serial in consequence. 
> > > 
> > > i.e.  multiple io-worker threads carrying out IO write fops are 
> > > blocked in posix_writev to execute final write fop pwrite/pwritev in
> > > __posix_writev function ONE AFTER ANOTHER. 
> > > 
> > > For example: 
> > > 
> > > thread1: iot_worker -> ...  -> posix_writev()   | 
> > > thread2: iot_worker -> ...  -> posix_writev()   | 
> > > thread3: iot_worker -> ...  -> posix_writev()   -> __posix_writev() 
> > > thread4: iot_worker -> ...  -> posix_writev()   | 
> > > 
> > > there are 4 iot_worker thread doing the 128KB IO write fops as 
> > > above, but only one can execute __posix_writev function and the 
> > > others have to wait. 
> > > 
> > > however, if the afr volume is configured on with storage.linux-aio 
> > > which is off in default,  the iot_worker will use posix_aio_writev 
> > > instead of posix_writev to write data. 
> > > the posix_aio_writev function won't be affected by 
> > > GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write performance goes 
up. 
> > > I think this is a bug :-(. 
> > 
> > Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND is a
> > misuse in afr_writev. 
> > I checked the original intent of GLUSTERS_WRITE_IS_APPEND change at 
> > review website: 
> > http://review.gluster.org/#/c/5501/ 
> > 
> > The initial purpose seems to avoid an unnecessary fsync() in 
> > afr_changelog_post_op_safe function if the writing data position 
> > was currently at the end of the file, detected by 
> > (preop.ia_size == offset || (fd->flags & O_APPEND)) in posix_writev. 
> > 
> > In comparison with the afr write performance loss, I think 
> > it costs too much. 
> > 
> > I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable 
> > just as ensure-durability in afr. 
> > 
> > You are right, it doesn't make sense to put this option in 
> > dictionary if ensure-durability is off. 
http://review.gluster.org/13285
> > addresses this. Do you want to try this out?
> > Thanks for doing most of the work :-). Do let me know if you want to
> > raise a bug for this. Or I can take that up if you don't have time.
> > 
> > Pranith 
> > 
> > > 
> > > So, my question is whether  AFR volume could work fine with 
> > > storage.linux-aio configuration which bypass the 
> > > GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, 
> > > and why glusterfs keeps posix_aio_writev different from posix_writev 
? 
> > > 
> > > Any replies to clear my confusion would be grateful, and thanks 
> in advance.
> > > What is the workload you have? multiple writers on same file 
workloads? 
> > 
> > I test the afr gluster volume by fio like this: 
> > fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k --
> > size=20G --numjobs=8   
> > --runtime=60 --group_reporting --name=afr_test  --iodepth=1 --
> ioengine=libaio
> > 
> > The Glusterfs BRICKS are two IBM X3550 M3. 
> > 
> > The local disk direct write performance of 128KB IO req block size 
> > is about 18MB/s 
> > in single thread and 80MB/s in 8 multi-threads. 
> > 
> > If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster volume 
> > write performance is 18MB/s 
> > as the single thread, and if not, the performance is nearby 75MB/s.
> > (network bandwith is enough) 
> > 
> > > 
> > > Pranith 
> > > 
> > > 
> > > --------------------------------------------------------
> > > ZTE Information Security Notice: The information contained in this 
> > > mail (and any attachment transmitted herewith) is privileged and 
> > > confidential and is intended for the exclusive use of the addressee
> > > (s).  If you are not an intended recipient, any disclosure, 
> > > reproduction, distribution or other dissemination or use of the 
> > > information contained is strictly prohibited.  If you have received 
> > > this mail in error, please delete it and notify us immediately.
> > > 
> > 
> > > 
> > > 
> > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel 
> > 
> > --------------------------------------------------------
> > ZTE Information Security Notice: The information contained in this 
> > mail (and any attachment transmitted herewith) is privileged and 
> > confidential and is intended for the exclusive use of the addressee
> > (s).  If you are not an intended recipient, any disclosure, 
> > reproduction, distribution or other dissemination or use of the 
> > information contained is strictly prohibited.  If you have received 
> > this mail in error, please delete it and notify us immediately.
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160128/f48fd841/attachment-0001.html>