[Gluster-devel] 答复: Re: Gluster AFR volume write performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev

li.ping288 at zte.com.cn li.ping288 at zte.com.cn
Fri Jan 22 01:44:53 UTC 2016


Hi Pranith, it is appreciated for your reply.

Pranith Kumar Karampuri <pkarampu at redhat.com> 写于 2016/01/20 18:51:19:

> 发件人:  Pranith Kumar Karampuri <pkarampu at redhat.com>
> 收件人:  li.ping288 at zte.com.cn, gluster-devel at gluster.org, 
> 日期:  2016/01/20 18:51
> 主题: Re: [Gluster-devel] Gluster AFR volume write performance has 
> been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev
> 
> Sorry for the delay in response.

> On 01/15/2016 02:34 PM, li.ping288 at zte.com.cn wrote:
> GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function at 
> glusterfs client end makes the posix_writev in the server end  deal 
> IO write fops from parallel  to serial in consequence. 
> 
> i.e.  multiple io-worker threads carrying out IO write fops are 
> blocked in posix_writev to execute final write fop pwrite/pwritev in
> __posix_writev function ONE AFTER ANOTHER. 
> 
> For example: 
> 
> thread1: iot_worker -> ...  -> posix_writev()   | 
> thread2: iot_worker -> ...  -> posix_writev()   | 
> thread3: iot_worker -> ...  -> posix_writev()   -> __posix_writev() 
> thread4: iot_worker -> ...  -> posix_writev()   | 
> 
> there are 4 iot_worker thread doing the 128KB IO write fops as 
> above, but only one can execute __posix_writev function and the 
> others have to wait. 
> 
> however, if the afr volume is configured on with storage.linux-aio 
> which is off in default,  the iot_worker will use posix_aio_writev 
> instead of posix_writev to write data. 
> the posix_aio_writev function won't be affected by 
> GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write performance goes up. 

> I think this is a bug :-(.

Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND is a 
misuse in afr_writev.
I checked the original intent of GLUSTERS_WRITE_IS_APPEND change at review 
website:
http://review.gluster.org/#/c/5501/ 

The initial purpose seems to avoid an unnecessary fsync() in 
afr_changelog_post_op_safe function if the writing data position 
was currently at the end of the file, detected by 
(preop.ia_size == offset || (fd->flags & O_APPEND)) in posix_writev.

In comparison with the afr write performance loss, I think 
it costs too much. 

I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable
just as ensure-durability in afr.

> 
> So, my question is whether  AFR volume could work fine with 
> storage.linux-aio configuration which bypass the 
> GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, 
> and why glusterfs keeps posix_aio_writev different from posix_writev ? 
> 
> Any replies to clear my confusion would be grateful, and thanks in 
advance.
> What is the workload you have? multiple writers on same file workloads?

I test the afr gluster volume by fio like this:
fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k --size=20G 
--numjobs=8 
--runtime=60 --group_reporting --name=afr_test  --iodepth=1 
--ioengine=libaio

The Glusterfs BRICKS are two IBM X3550 M3. 

The local disk direct write performance of 128KB IO req block size is 
about 18MB/s 
in single thread and 80MB/s in 8 multi-threads.

If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster volume write 
performance is 18MB/s
as the single thread, and if not, the performance is nearby 
75MB/s.(network bandwith is enough)

> 
> Pranith
> 
> 
> --------------------------------------------------------
> ZTE Information Security Notice: The information contained in this 
> mail (and any attachment transmitted herewith) is privileged and 
> confidential and is intended for the exclusive use of the addressee
> (s).  If you are not an intended recipient, any disclosure, 
> reproduction, distribution or other dissemination or use of the 
> information contained is strictly prohibited.  If you have received 
> this mail in error, please delete it and notify us immediately.
> 

> 
> 

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
--------------------------------------------------------
ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s).  If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited.  If you have received this mail in error, please delete it and notify us immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160122/dbfc46ae/attachment.html>


More information about the Gluster-devel mailing list