[Gluster-users] Unreasonably poor performance of replicated volumes

Anastasia Belyaeva anastasia.blv at gmail.com
Wed Apr 11 19:03:46 UTC 2018


Hello everybody!

I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are actually
virtual machines located on 3 separate physical XenServer7.1 servers)

They are all connected via infiniband network. Iperf3 shows around *23
Gbit/s network bandwidth *between each 2 of them.

Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical
volume created on top of it, formatted with *xfs*. Gluster top reports the
following throughput:

root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288
> list-cnt 0
> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
> Throughput *631.82 MBps *time 3.3989 secs
> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
> Throughput *566.96 MBps *time 3.7877 secs
> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
> Throughput *546.65 MBps *time 3.9285 secs


root at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288
> list-cnt 0
> Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
> Throughput *539.60 MBps *time 3.9798 secs
> Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
> Throughput *580.07 MBps *time 3.7021 secs


And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The
'replica 2' volume is for testing purpose only.

> Volume Name: r2vol
> Type: Replicate
> Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick
> Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick
> Options Reconfigured:
> nfs.disable: on
>


> Volume Name: r3vol
> Type: Replicate
> Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
> Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
> Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
> Options Reconfigured:
> nfs.disable: on



*Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount*

> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster
> gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)



*The problem *is that there is a significant performance loss with smaller
block sizes. For example:

*4K block size*
[replica 3 volume]
root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s*

[replica 2 volume]
root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s*

*512K block size*
[replica 3 volume]
root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s*

[replica 2 volume]
root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048
2048+0 records in
2048+0 records out
1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s*

With bigger block size It's still not where I expect it to be, but at least
it starts to make some sense.

I've been trying to solve this for a very long time with no luck.
I've already tried both kernel tuning (different 'tuned' profiles and the
ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster
volume options, including
write-behind/flush-behind/write-behind-window-size.
The latter, to my surprise, didn't make any difference. 'Cause at first I
thought it was the buffering issue but it turns out it does buffer writes,
just not very efficient (well at least what it looks like in the *gluster
profile output*)

root at fsnode2 ~ $ gluster volume profile r3vol info clear
> ...
> Cleared stats.


root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero
> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144
> 262144+0 records in
> 262144+0 records out
> 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s



> root at fsnode2 ~ $ gluster volume profile r3vol info
> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick
> -------------------------------------------------------
> Cumulative Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      18.00 us      18.00 us      18.00 us              1
>  STATFS
>       0.00      20.50 us      11.00 us      30.00 us              2
> FLUSH
>       0.00      22.50 us      17.00 us      28.00 us              2
>  FINODELK
>       0.01      76.50 us      65.00 us      88.00 us              2
>  FXATTROP
>       0.01     177.00 us     177.00 us     177.00 us              1
>  CREATE
>       0.02      56.14 us      23.00 us     128.00 us              7
>  LOOKUP
>       0.02     259.00 us      20.00 us     498.00 us              2
> ENTRYLK
>      99.94      59.23 us      17.00 us   10914.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes
> Interval 0 Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      18.00 us      18.00 us      18.00 us              1
>  STATFS
>       0.00      20.50 us      11.00 us      30.00 us              2
> FLUSH
>       0.00      22.50 us      17.00 us      28.00 us              2
>  FINODELK
>       0.01      76.50 us      65.00 us      88.00 us              2
>  FXATTROP
>       0.01     177.00 us     177.00 us     177.00 us              1
>  CREATE
>       0.02      56.14 us      23.00 us     128.00 us              7
>  LOOKUP
>       0.02     259.00 us      20.00 us     498.00 us              2
> ENTRYLK
>      99.94      59.23 us      17.00 us   10914.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes
> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick
> -------------------------------------------------------
> Cumulative Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      33.00 us      33.00 us      33.00 us              1
>  STATFS
>       0.00      22.50 us      13.00 us      32.00 us              2
> ENTRYLK
>       0.00      32.00 us      26.00 us      38.00 us              2
> FLUSH
>       0.01      47.50 us      16.00 us      79.00 us              2
>  FINODELK
>       0.01     157.00 us     157.00 us     157.00 us              1
>  CREATE
>       0.01      92.00 us      70.00 us     114.00 us              2
>  FXATTROP
>       0.03      72.57 us      39.00 us     121.00 us              7
>  LOOKUP
>      99.94      47.97 us      15.00 us    1598.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes
> Interval 0 Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      33.00 us      33.00 us      33.00 us              1
>  STATFS
>       0.00      22.50 us      13.00 us      32.00 us              2
> ENTRYLK
>       0.00      32.00 us      26.00 us      38.00 us              2
> FLUSH
>       0.01      47.50 us      16.00 us      79.00 us              2
>  FINODELK
>       0.01     157.00 us     157.00 us     157.00 us              1
>  CREATE
>       0.01      92.00 us      70.00 us     114.00 us              2
>  FXATTROP
>       0.03      72.57 us      39.00 us     121.00 us              7
>  LOOKUP
>      99.94      47.97 us      15.00 us    1598.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes
> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick
> -------------------------------------------------------
> Cumulative Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      58.00 us      58.00 us      58.00 us              1
>  STATFS
>       0.00      38.00 us      38.00 us      38.00 us              2
> ENTRYLK
>       0.01      59.00 us      32.00 us      86.00 us              2
> FLUSH
>       0.01      81.00 us      33.00 us     129.00 us              2
>  FINODELK
>       0.01      91.50 us      73.00 us     110.00 us              2
>  FXATTROP
>       0.01     239.00 us     239.00 us     239.00 us              1
>  CREATE
>       0.04     103.14 us      63.00 us     210.00 us              7
>  LOOKUP
>      99.92      52.99 us      16.00 us   11289.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes
> Interval 0 Stats:
>    Block Size:               4096b+                8192b+
> 16384b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 1576                  4173
> 19605
>    Block Size:              32768b+               65536b+
>  131072b+
>  No. of Reads:                    0                     0
>     0
> No. of Writes:                 7777                  1847
>   657
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us              1
> RELEASE
>       0.00      58.00 us      58.00 us      58.00 us              1
>  STATFS
>       0.00      38.00 us      38.00 us      38.00 us              2
> ENTRYLK
>       0.01      59.00 us      32.00 us      86.00 us              2
> FLUSH
>       0.01      81.00 us      33.00 us     129.00 us              2
>  FINODELK
>       0.01      91.50 us      73.00 us     110.00 us              2
>  FXATTROP
>       0.01     239.00 us     239.00 us     239.00 us              1
>  CREATE
>       0.04     103.14 us      63.00 us     210.00 us              7
>  LOOKUP
>      99.92      52.99 us      16.00 us   11289.00 us          35635
> WRITE
>     Duration: 38 seconds
>    Data Read: 0 bytes
> Data Written: 1073741824 bytes



At this point I'm officially run out of idea where to look next. So any
help, suggestions or pointers are highly appreciated!

-- 
Best regards,
Anastasia Belyaeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180411/df462c8d/attachment.html>


More information about the Gluster-users mailing list