[Bugs] [Bug 1375125] New: arbiter volume write performance is bad.

Mon Sep 12 08:33:23 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1375125

            Bug ID: 1375125
           Summary: arbiter volume write performance is bad.
           Product: GlusterFS
           Version: 3.8.3
         Component: arbiter
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: max.raba at comsysto.com
                CC: bugs at gluster.org, ravishankar at redhat.com

Hello,

unfortunately we have an issue regarding volume configuration with replica 2
and arbiter 1. Without the arbiter it performs quite well when mounted with
fuse. After add the arbiter the wirte performance of the mount drops massively.
KVM which uses libfsapi seems to have no issue with that.

The example was generated on a VM test infrastructure with 4 hosts.
The bug is also present in the production environment which conists of 4
physical nodes containing a 4,4TB RAID10 10K HDD disk. 

Here is the info about the configuration. 

[root at gluster-test-0 ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 

[root at gluster-test-0 ~]# gluster --version
glusterfs 3.8.3 built on Aug 22 2016 12:58:57
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.

[root at gluster-test-0 ~]# gluster peer status
Number of Peers: 3

Hostname: gluster-test-1
Uuid: 4b0a3a86-99d1-4ea7-b929-24b723c822fd
State: Peer in Cluster (Connected)

Hostname: gluster-test-2
Uuid: 194edbe6-e08f-4b15-b8e8-7edf6d804882
State: Peer in Cluster (Connected)

Hostname: gluster-test-3
Uuid: 669aa583-9b65-41a8-9a3d-6c8d97325599
State: Peer in Cluster (Connected)

[root at gluster-test-0 storage]# gluster volume info
Volume Name: storage
Type: Distributed-Replicate
Volume ID: a234f5e4-eefa-40f8-9c95-f204738cb31e
Status: Stopped
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gluster-test-0:/data/brick/brick2
Brick2: gluster-test-1:/data/brick/brick2
Brick3: gluster-test-2:/data/brick/brick2
Brick4: gluster-test-3:/data/brick/brick2
Options Reconfigured:
nfs.transport-type: tcp
config.transport: tcp
cluster.self-heal-daemon: on
performance.io-thread-count: 64
storage.owner-gid: 107
storage.owner-uid: 107
cluster.server-quorum-type: server
server.allow-insecure: on
cluster.quorum-type: auto
network.remote-dio: disable
performance.stat-prefetch: on
performance.io-cache: on
performance.read-ahead: on
performance.quick-read: off
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 64
auth.allow: 192.168.1.*
cluster.data-self-heal-algorithm: full
network.ping-timeout: 42
performance.readdir-ahead: on
nfs.disable: on
performance.cache-size: 512MB

[root at gluster-test-0 storage]# gluster volume status
Status of volume: storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster-test-0:/data/brick/brick2     49153     0          Y       13044
Brick gluster-test-1:/data/brick/brick2     49153     0          Y       5085 
Brick gluster-test-2:/data/brick/arbiter    49156     0          Y       19480
Brick gluster-test-2:/data/brick/brick2     49153     0          Y       18897
Brick gluster-test-3:/data/brick/brick2     49154     0          Y       5187 
Brick gluster-test-0:/data/brick/arbiter    49156     0          Y       13486
Self-heal Daemon on localhost               N/A       N/A        Y       13506
Self-heal Daemon on gluster-test-2.node.emn
osrz.loyaltypartner.com                     N/A       N/A        Y       19500
Self-heal Daemon on gluster-test-3.node.emn
osrz.loyaltypartner.com                     N/A       N/A        Y       5435 
Self-heal Daemon on gluster-test-1.node.emn
osrz.loyaltypartner.com                     N/A       N/A        Y       5569 

Task Status of Volume storage
------------------------------------------------------------------------------
There are no active volume tasks

This is the mount:

[root at gluster-test-0 storage]# mount
...
gluster-test-0:storage on /srv/storage type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

When I check performance with dd on the volume I get

On FUSE mount:
[root at gluster-test-0 storage]# dd if=/dev/zero of=testfile count=1 bs=10M 
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.173773 s, 60.3 MB/s

On libfsapi:
104857600 Bytes (105 MB) kopiert, 0,752445 s, 139 MB/s
[root at storager2a1 ~]# dd if=/dev/zero of=testfile count=10 bs=10M oflag=direct
10+0 DatensÃ¤tze ein

Now I add an arbiter

[root at gluster-test-0 storage]# gluster volume add-brick storage replica 3
arbiter 1 gluster-test-2:/data/brick/arbiter gluster-test-0:/data/brick/arbiter

FUSE mount:
[root at gluster-test-0 storage]# dd if=/dev/zero of=testfile count=1 bs=10M 
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 13.076 s, 802 kB/s

On VM via libfsapi
[root at storager2a1 ~]# dd if=/dev/zero of=testfile count=10 bs=10M oflag=direct
10+0 DatensÃ¤tze ein
10+0 DatensÃ¤tze aus
104857600 Bytes (105 MB) kopiert, 0,868624 s, 121 MB/s

To check if it is working without the arbiter I remove it again:

[root at gluster-test-0 storage]# gluster volume remove-brick storage replica 2
gluster-test-2:/data/brick/arbiter gluster-test-0:/data/brick/arbiter force

On FUSE mount:
[root at gluster-test-0 storage]# dd if=/dev/zero of=testfile count=1 bs=10M 
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.150458 s, 69.7 MB/s

On VM with libfsapi
[root at storager2a1 ~]# dd if=/dev/zero of=testfile count=10 bs=10M oflag=direct
10+0 DatensÃ¤tze ein
10+0 DatensÃ¤tze aus
104857600 Bytes (105 MB) kopiert, 0,618805 s, 169 MB/s

Can you please give me a hint how we could increase the write performance on
the FUSE mount

Thanks
Max

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.