[Gluster-users] Unreasonably poor performance of replicated volumes
Anastasia Belyaeva
anastasia.blv at gmail.com
Fri Apr 13 17:58:14 UTC 2018
Thanks a lot for your reply!
You guessed it right though - mailing lists, various blogs, documentation,
videos and even source code at this point. Changing some off the options
does make performance slightly better, but nothing particularly
groundbreaking.
So, if I understand you correctly, no one has yet managed to get acceptable
performance (relative to underlying hardware capabilities) with smaller
block sizes? Is there an explanation for this?
2018-04-13 1:57 GMT+03:00 Vlad Kopylov <vladkopy at gmail.com>:
> Guess you went through user lists and tried something like this already
> http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
> I have a same exact setup and below is as far as it went after months of
> trail and error.
> We all have somewhat same setup and same issue with this - you can find
> same post as yours on the daily basis.
>
> On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva <
> anastasia.blv at gmail.com> wrote:
>
>> Hello everybody!
>>
>> I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are
>> actually virtual machines located on 3 separate physical XenServer7.1
>> servers)
>>
>> They are all connected via infiniband network. Iperf3 shows around *23
>> Gbit/s network bandwidth *between each 2 of them.
>>
>> Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with
>> logical volume created on top of it, formatted with *xfs*. Gluster top
>> reports the following throughput:
>>
>> root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count
>>> 524288 list-cnt 0
>>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Throughput *631.82 MBps *time 3.3989 secs
>>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Throughput *566.96 MBps *time 3.7877 secs
>>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Throughput *546.65 MBps *time 3.9285 secs
>>
>>
>> root at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count
>>> 524288 list-cnt 0
>>> Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
>>> Throughput *539.60 MBps *time 3.9798 secs
>>> Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
>>> Throughput *580.07 MBps *time 3.7021 secs
>>
>>
>> And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The
>> 'replica 2' volume is for testing purpose only.
>>
>>> Volume Name: r2vol
>>> Type: Replicate
>>> Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
>>> Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
>>> Options Reconfigured:
>>> nfs.disable: on
>>>
>>
>>
>>> Volume Name: r3vol
>>> Type: Replicate
>>> Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> Options Reconfigured:
>>> nfs.disable: on
>>
>>
>>
>> *Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount*
>>
>>
>>> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster
>>> gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs
>>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_
>>> other,max_read=131072)
>>> gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs
>>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_
>>> other,max_read=131072)
>>
>>
>>
>> *The problem *is that there is a significant performance loss with
>> smaller block sizes. For example:
>>
>> *4K block size*
>> [replica 3 volume]
>> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
>> 262144+0 records in
>> 262144+0 records out
>> 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*
>>
>> [replica 2 volume]
>> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>> of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
>> 262144+0 records in
>> 262144+0 records out
>> 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*
>>
>> *512K block size*
>> [replica 3 volume]
>> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>> of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
>> 2048+0 records in
>> 2048+0 records out
>> 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*
>>
>> [replica 2 volume]
>> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>> of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
>> 2048+0 records in
>> 2048+0 records out
>> 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*
>>
>> With bigger block size It's still not where I expect it to be, but at
>> least it starts to make some sense.
>>
>> I've been trying to solve this for a very long time with no luck.
>> I've already tried both kernel tuning (different 'tuned' profiles and the
>> ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster
>> volume options, including write-behind/flush-behind/writ
>> e-behind-window-size.
>> The latter, to my surprise, didn't make any difference. 'Cause at first I
>> thought it was the buffering issue but it turns out it does buffer writes,
>> just not very efficient (well at least what it looks like in the *gluster
>> profile output*)
>>
>> root at fsnode2 ~ $ gluster volume profile r3vol info clear
>>> ...
>>> Cleared stats.
>>
>>
>> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>>> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
>>> 262144+0 records in
>>> 262144+0 records out
>>> 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s
>>
>>
>>
>>> root at fsnode2 ~ $ gluster volume profile r3vol info
>>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> -------------------------------------------------------
>>> Cumulative Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 18.00 us 18.00 us 18.00 us 1
>>> STATFS
>>> 0.00 20.50 us 11.00 us 30.00 us 2
>>> FLUSH
>>> 0.00 22.50 us 17.00 us 28.00 us 2
>>> FINODELK
>>> 0.01 76.50 us 65.00 us 88.00 us 2
>>> FXATTROP
>>> 0.01 177.00 us 177.00 us 177.00 us 1
>>> CREATE
>>> 0.02 56.14 us 23.00 us 128.00 us 7
>>> LOOKUP
>>> 0.02 259.00 us 20.00 us 498.00 us 2
>>> ENTRYLK
>>> 99.94 59.23 us 17.00 us 10914.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>> Interval 0 Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 18.00 us 18.00 us 18.00 us 1
>>> STATFS
>>> 0.00 20.50 us 11.00 us 30.00 us 2
>>> FLUSH
>>> 0.00 22.50 us 17.00 us 28.00 us 2
>>> FINODELK
>>> 0.01 76.50 us 65.00 us 88.00 us 2
>>> FXATTROP
>>> 0.01 177.00 us 177.00 us 177.00 us 1
>>> CREATE
>>> 0.02 56.14 us 23.00 us 128.00 us 7
>>> LOOKUP
>>> 0.02 259.00 us 20.00 us 498.00 us 2
>>> ENTRYLK
>>> 99.94 59.23 us 17.00 us 10914.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> -------------------------------------------------------
>>> Cumulative Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 33.00 us 33.00 us 33.00 us 1
>>> STATFS
>>> 0.00 22.50 us 13.00 us 32.00 us 2
>>> ENTRYLK
>>> 0.00 32.00 us 26.00 us 38.00 us 2
>>> FLUSH
>>> 0.01 47.50 us 16.00 us 79.00 us 2
>>> FINODELK
>>> 0.01 157.00 us 157.00 us 157.00 us 1
>>> CREATE
>>> 0.01 92.00 us 70.00 us 114.00 us 2
>>> FXATTROP
>>> 0.03 72.57 us 39.00 us 121.00 us 7
>>> LOOKUP
>>> 99.94 47.97 us 15.00 us 1598.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>> Interval 0 Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 33.00 us 33.00 us 33.00 us 1
>>> STATFS
>>> 0.00 22.50 us 13.00 us 32.00 us 2
>>> ENTRYLK
>>> 0.00 32.00 us 26.00 us 38.00 us 2
>>> FLUSH
>>> 0.01 47.50 us 16.00 us 79.00 us 2
>>> FINODELK
>>> 0.01 157.00 us 157.00 us 157.00 us 1
>>> CREATE
>>> 0.01 92.00 us 70.00 us 114.00 us 2
>>> FXATTROP
>>> 0.03 72.57 us 39.00 us 121.00 us 7
>>> LOOKUP
>>> 99.94 47.97 us 15.00 us 1598.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>>> -------------------------------------------------------
>>> Cumulative Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 58.00 us 58.00 us 58.00 us 1
>>> STATFS
>>> 0.00 38.00 us 38.00 us 38.00 us 2
>>> ENTRYLK
>>> 0.01 59.00 us 32.00 us 86.00 us 2
>>> FLUSH
>>> 0.01 81.00 us 33.00 us 129.00 us 2
>>> FINODELK
>>> 0.01 91.50 us 73.00 us 110.00 us 2
>>> FXATTROP
>>> 0.01 239.00 us 239.00 us 239.00 us 1
>>> CREATE
>>> 0.04 103.14 us 63.00 us 210.00 us 7
>>> LOOKUP
>>> 99.92 52.99 us 16.00 us 11289.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>> Interval 0 Stats:
>>> Block Size: 4096b+ 8192b+
>>> 16384b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 1576 4173
>>> 19605
>>> Block Size: 32768b+ 65536b+
>>> 131072b+
>>> No. of Reads: 0 0
>>> 0
>>> No. of Writes: 7777 1847
>>> 657
>>> %-latency Avg-latency Min-Latency Max-Latency No. of calls
>>> Fop
>>> --------- ----------- ----------- ----------- ------------
>>> ----
>>> 0.00 0.00 us 0.00 us 0.00 us 1
>>> RELEASE
>>> 0.00 58.00 us 58.00 us 58.00 us 1
>>> STATFS
>>> 0.00 38.00 us 38.00 us 38.00 us 2
>>> ENTRYLK
>>> 0.01 59.00 us 32.00 us 86.00 us 2
>>> FLUSH
>>> 0.01 81.00 us 33.00 us 129.00 us 2
>>> FINODELK
>>> 0.01 91.50 us 73.00 us 110.00 us 2
>>> FXATTROP
>>> 0.01 239.00 us 239.00 us 239.00 us 1
>>> CREATE
>>> 0.04 103.14 us 63.00 us 210.00 us 7
>>> LOOKUP
>>> 99.92 52.99 us 16.00 us 11289.00 us 35635
>>> WRITE
>>> Duration: 38 seconds
>>> Data Read: 0 bytes
>>> Data Written: 1073741824 bytes
>>
>>
>>
>> At this point I'm officially run out of idea where to look next. So any
>> help, suggestions or pointers are highly appreciated!
>>
>> --
>> Best regards,
>> Anastasia Belyaeva
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
--
Best regards,
Anastasia Belyaeva
С уважением,
Анастасия Беляева
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180413/393a01db/attachment.html>
More information about the Gluster-users
mailing list