[Bugs] [Bug 1378867] New: Poor smallfile read performance on Arbiter volume compared to Replica 3 volume
bugzilla at redhat.com
bugzilla at redhat.com
Fri Sep 23 12:16:29 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1378867
Bug ID: 1378867
Summary: Poor smallfile read performance on Arbiter volume
compared to Replica 3 volume
Product: Red Hat Gluster Storage
Version: 3.2
Component: arbiter
Assignee: ravishankar at redhat.com
Reporter: ravishankar at redhat.com
QA Contact: ksandha at redhat.com
CC: bugs at gluster.org, mpillai at redhat.com,
pkarampu at redhat.com, psuriset at redhat.com,
ravishankar at redhat.com, rcyriac at redhat.com,
rhs-bugs at redhat.com, rsussman at redhat.com,
shberry at redhat.com, storage-qa-internal at redhat.com
Depends On: 1377193
Blocks: 1378684
+++ This bug was initially created as a clone of Bug #1377193 +++
Description of problem:
Expectation was smallfile read performance on Arbiter volume would match
replica 3 smallfile read performance.
Observation is Arbiter volume read performance is 30% of replica 3 read
performance.
Version-Release number of selected component (if applicable):
glusterfs-cli-3.8.2-1.el7.x86_64
glusterfs-3.8.2-1.el7.x86_64
glusterfs-api-3.8.2-1.el7.x86_64
glusterfs-libs-3.8.2-1.el7.x86_64
glusterfs-fuse-3.8.2-1.el7.x86_64
glusterfs-client-xlators-3.8.2-1.el7.x86_64
glusterfs-server-3.8.2-1.el7.x86_64
How reproducible:
Every time.
gluster v info (Replica 3 volume)
Volume Name: rep3
Type: Distributed-Replicate
Volume ID: e7a5d84d-31da-40a8-85d0-2b94b95c3b28
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.40.13:/bricks/b/g
Brick2: 172.17.40.14:/bricks/b/g
Brick3: 172.17.40.15:/bricks/b/g
Brick4: 172.17.40.16:/bricks/b/g
Brick5: 172.17.40.22:/bricks/b/g
Brick6: 172.17.40.24:/bricks/b/g
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
gluster v info (Arbiter Volume)
Volume Name: arb
Type: Distributed-Replicate
Volume ID: e7a5d84d-31da-40a8-85d0-2b94b95c3b28
Status: Started
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Brick1: 172.17.40.13:/bricks/b01/g
Brick2: 172.17.40.14:/bricks/b01/g
Brick3: 172.17.40.15:/bricks/b02/g (arbiter)
Brick4: 172.17.40.15:/bricks/b01/g
Brick5: 172.17.40.16:/bricks/b01/g
Brick6: 172.17.40.22:/bricks/b02/g (arbiter)
Brick7: 172.17.40.22:/bricks/b01/g
Brick8: 172.17.40.24:/bricks/b01/g
Brick9: 172.17.40.13:/bricks/b02/g (arbiter)
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
Steps to Reproduce:
For both Replica 3 volume and Arbiter Volume, do the following
1. Creation of files. Drop cache on server and client side. Create smallfile
files using command /root/smallfile/smallfile_cli.py --top /mnt/glusterfs
--host-set clientfile --threads 4 --file-size 256 --files 6554 --record-size 32
--fsync Y --operation create
2. Reading of files. Again drop cache on server and client side. Read
smallfiles using command /root/smallfile/smallfile_cli.py --top /mnt/glusterfs
--host-set clientfile --threads 4 --file-size 256 --files 6554 --record-size
32 --operation read
3. Compare the read performance for both replica 3 and Arbiter volume
Actual results:
Arbiter read performance is 30% of replica 3 read performance for smallfile
workload.
Expected results:
Smallfile read performance of Arbiter volume and Replica 3 volume should
ideally be same.
--Shekhar
--- Additional comment from Ravishankar N on 2016-09-19 03:31:50 EDT ---
Note to self: workload used:https://github.com/bengland2/smallfile
--- Additional comment from Shekhar Berry on 2016-09-19 04:07:56 EDT ---
Smallfile Performance numbers:
Create Performance for 256KiB file size
---------------------------------------
Replica 2 Volume : 407 files/sec/server
Arbiter Volume : 317 files/sec/server
Replica 3 Volume : 306 files/sec/server
Read Performance for 256KiB file size
-------------------------------------
Replica 2 Volume : 380 files/sec/server
Arbiter Volume : 132 files/sec/server
Replica 3 Volume : 329 files/sec/server
--Shekhar
--- Additional comment from Ravishankar N on 2016-09-22 05:55:55 EDT ---
I was able to get similar results on my testing where the 'files/sec' was
almost half for a 1x (2+1) setup when compared to a 1x3 setup for 256KB write
size. A summary of the cumulative brick profile info on one such run is given
below for some FOPS:
Replica 3 vol
--------------
No of calls:
Brick1 Brick2 Brick3
Lookup 28,544 28,545 28,552
Read 17,695 17,507 17,228
FSTAT 17,714 17,535 17,247
Inodelk 8 8 8
Arbiter vol
-----------
No. of calls:
Brick1 Brick2 Arbiter brick
Lookup 56,241 56,246 56,245
Read 34,920 17,508 -
FSTAT 34,995 17,533 -
Inodelk 52,442 52,442 52,442
I see that the sum total of the reads on all bricks is similar for both replica
and arbiter setups. In arbiter vol, zero reads are served from arbiter brick
and so the read load is spread between 1st 2 bricks. Likewise for Fstat.
But the problem seems to be in the number of lookups. For arbiter volume, the
number seems to be double than replica-3. I'm guessing this is what is slowing
things down. I also see a lot of Inodelks for the arbiter volume, which is
unexpected because the I/O was a read operation. I need to figure out why these
2 things are happening.
--- Additional comment from Ravishankar N on 2016-09-23 01:43:42 EDT ---
Pranith suggested that the extra lookups and inodelks could be due to spurious
heals triggered for some reason. Indeed, disabling client side heals brings the
read performance numbers in proximity replica-3. On debugging it was found that
the lookups were triggering metadata heals due to a mismatching count in the
dict, as explained in the patch (BZ 1378684).
Here are the profile numbers with the fix on arbiter vol:
No. of calls:
Brick1 Brick2 Arbiter brick
Lookup 28805 28809 28817
Read 34920 17507 -
FSTAT 34991 17547 -
Inodelk 8 8 8
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1377193
[Bug 1377193] Poor smallfile read performance on Arbiter volume compared to
Replica 3 volume
https://bugzilla.redhat.com/show_bug.cgi?id=1378684
[Bug 1378684] Poor smallfile read performance on Arbiter volume compared to
Replica 3 volume
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=fnGDgh5vEo&a=cc_unsubscribe
More information about the Bugs
mailing list