[Gluster-users] Unreasonably poor performance of replicated volumes

Vlad Kopylov vladkopy at gmail.com
Thu Apr 12 22:57:03 UTC 2018


Guess you went through user lists and tried something like this already
http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
I have a same exact setup and below is as far as it went after months of
trail and error.
We all have somewhat same setup and same issue with this - you can find
same post as yours on the daily basis.

On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva <anastasia.blv at gmail.com
> wrote:

> Hello everybody!
>
> I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are
> actually virtual machines located on 3 separate physical XenServer7.1
> servers)
>
> They are all connected via infiniband network. Iperf3 shows around *23
> Gbit/s network bandwidth *between each 2 of them.
>
> Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical
> volume created on top of it, formatted with *xfs*. Gluster top reports
> the following throughput:
>
> root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288
>> list-cnt 0
>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Throughput *631.82 MBps *time 3.3989 secs
>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Throughput *566.96 MBps *time 3.7877 secs
>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Throughput *546.65 MBps *time 3.9285 secs
>
>
> root at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288
>> list-cnt 0
>> Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
>> Throughput *539.60 MBps *time 3.9798 secs
>> Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
>> Throughput *580.07 MBps *time 3.7021 secs
>
>
> And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The
> 'replica 2' volume is for testing purpose only.
>
>> Volume Name: r2vol
>> Type: Replicate
>> Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
>> Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
>> Options Reconfigured:
>> nfs.disable: on
>>
>
>
>> Volume Name: r3vol
>> Type: Replicate
>> Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>> Options Reconfigured:
>> nfs.disable: on
>
>
>
> *Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount*
>
>> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster
>> gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,default_permissions,
>> allow_other,max_read=131072)
>> gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,default_permissions,
>> allow_other,max_read=131072)
>
>
>
> *The problem *is that there is a significant performance loss with
> smaller block sizes. For example:
>
> *4K block size*
> [replica 3 volume]
> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
> 262144+0 records in
> 262144+0 records out
> 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*
>
> [replica 2 volume]
> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
> of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
> 262144+0 records in
> 262144+0 records out
> 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*
>
> *512K block size*
> [replica 3 volume]
> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
> of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
> 2048+0 records in
> 2048+0 records out
> 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*
>
> [replica 2 volume]
> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
> of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
> 2048+0 records in
> 2048+0 records out
> 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*
>
> With bigger block size It's still not where I expect it to be, but at
> least it starts to make some sense.
>
> I've been trying to solve this for a very long time with no luck.
> I've already tried both kernel tuning (different 'tuned' profiles and the
> ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster
> volume options, including write-behind/flush-behind/
> write-behind-window-size.
> The latter, to my surprise, didn't make any difference. 'Cause at first I
> thought it was the buffering issue but it turns out it does buffer writes,
> just not very efficient (well at least what it looks like in the *gluster
> profile output*)
>
> root at fsnode2 ~ $ gluster volume profile r3vol info clear
>> ...
>> Cleared stats.
>
>
> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
>> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
>> 262144+0 records in
>> 262144+0 records out
>> 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s
>
>
>
>> root at fsnode2 ~ $ gluster volume profile r3vol info
>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
>> -------------------------------------------------------
>> Cumulative Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      18.00 us      18.00 us      18.00 us              1
>>  STATFS
>>       0.00      20.50 us      11.00 us      30.00 us              2
>> FLUSH
>>       0.00      22.50 us      17.00 us      28.00 us              2
>>  FINODELK
>>       0.01      76.50 us      65.00 us      88.00 us              2
>>  FXATTROP
>>       0.01     177.00 us     177.00 us     177.00 us              1
>>  CREATE
>>       0.02      56.14 us      23.00 us     128.00 us              7
>>  LOOKUP
>>       0.02     259.00 us      20.00 us     498.00 us              2
>> ENTRYLK
>>      99.94      59.23 us      17.00 us   10914.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>> Interval 0 Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      18.00 us      18.00 us      18.00 us              1
>>  STATFS
>>       0.00      20.50 us      11.00 us      30.00 us              2
>> FLUSH
>>       0.00      22.50 us      17.00 us      28.00 us              2
>>  FINODELK
>>       0.01      76.50 us      65.00 us      88.00 us              2
>>  FXATTROP
>>       0.01     177.00 us     177.00 us     177.00 us              1
>>  CREATE
>>       0.02      56.14 us      23.00 us     128.00 us              7
>>  LOOKUP
>>       0.02     259.00 us      20.00 us     498.00 us              2
>> ENTRYLK
>>      99.94      59.23 us      17.00 us   10914.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
>> -------------------------------------------------------
>> Cumulative Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      33.00 us      33.00 us      33.00 us              1
>>  STATFS
>>       0.00      22.50 us      13.00 us      32.00 us              2
>> ENTRYLK
>>       0.00      32.00 us      26.00 us      38.00 us              2
>> FLUSH
>>       0.01      47.50 us      16.00 us      79.00 us              2
>>  FINODELK
>>       0.01     157.00 us     157.00 us     157.00 us              1
>>  CREATE
>>       0.01      92.00 us      70.00 us     114.00 us              2
>>  FXATTROP
>>       0.03      72.57 us      39.00 us     121.00 us              7
>>  LOOKUP
>>      99.94      47.97 us      15.00 us    1598.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>> Interval 0 Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      33.00 us      33.00 us      33.00 us              1
>>  STATFS
>>       0.00      22.50 us      13.00 us      32.00 us              2
>> ENTRYLK
>>       0.00      32.00 us      26.00 us      38.00 us              2
>> FLUSH
>>       0.01      47.50 us      16.00 us      79.00 us              2
>>  FINODELK
>>       0.01     157.00 us     157.00 us     157.00 us              1
>>  CREATE
>>       0.01      92.00 us      70.00 us     114.00 us              2
>>  FXATTROP
>>       0.03      72.57 us      39.00 us     121.00 us              7
>>  LOOKUP
>>      99.94      47.97 us      15.00 us    1598.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
>> -------------------------------------------------------
>> Cumulative Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      58.00 us      58.00 us      58.00 us              1
>>  STATFS
>>       0.00      38.00 us      38.00 us      38.00 us              2
>> ENTRYLK
>>       0.01      59.00 us      32.00 us      86.00 us              2
>> FLUSH
>>       0.01      81.00 us      33.00 us     129.00 us              2
>>  FINODELK
>>       0.01      91.50 us      73.00 us     110.00 us              2
>>  FXATTROP
>>       0.01     239.00 us     239.00 us     239.00 us              1
>>  CREATE
>>       0.04     103.14 us      63.00 us     210.00 us              7
>>  LOOKUP
>>      99.92      52.99 us      16.00 us   11289.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>> Interval 0 Stats:
>>    Block Size:               4096b+                8192b+
>> 16384b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 1576                  4173
>> 19605
>>    Block Size:              32768b+               65536b+
>>  131072b+
>>  No. of Reads:                    0                     0
>>     0
>> No. of Writes:                 7777                  1847
>>   657
>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>   Fop
>>  ---------   -----------   -----------   -----------   ------------
>>  ----
>>       0.00       0.00 us       0.00 us       0.00 us              1
>> RELEASE
>>       0.00      58.00 us      58.00 us      58.00 us              1
>>  STATFS
>>       0.00      38.00 us      38.00 us      38.00 us              2
>> ENTRYLK
>>       0.01      59.00 us      32.00 us      86.00 us              2
>> FLUSH
>>       0.01      81.00 us      33.00 us     129.00 us              2
>>  FINODELK
>>       0.01      91.50 us      73.00 us     110.00 us              2
>>  FXATTROP
>>       0.01     239.00 us     239.00 us     239.00 us              1
>>  CREATE
>>       0.04     103.14 us      63.00 us     210.00 us              7
>>  LOOKUP
>>      99.92      52.99 us      16.00 us   11289.00 us          35635
>> WRITE
>>     Duration: 38 seconds
>>    Data Read: 0 bytes
>> Data Written: 1073741824 bytes
>
>
>
> At this point I'm officially run out of idea where to look next. So any
> help, suggestions or pointers are highly appreciated!
>
> --
> Best regards,
> Anastasia Belyaeva
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180412/11a89d6f/attachment.html>


More information about the Gluster-users mailing list