[Bugs] [Bug 1419825] New: Sequential and Random Writes are off target by 12% and 22% respectively on EC backed volumes over FUSE

Tue Feb 7 07:57:39 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1419825

            Bug ID: 1419825
           Summary: Sequential and Random Writes are off target by 12% and
                    22% respectively on EC backed volumes over FUSE
           Product: GlusterFS
           Version: 3.10
         Component: disperse
          Keywords: Performance, Regression
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: xhernandez at datalab.es
                CC: amukherj at redhat.com, asoman at redhat.com,
                    aspandey at redhat.com, bturner at redhat.com,
                    bugs at gluster.org, rcyriac at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1409191
            Blocks: 1408639, 1415160

+++ This bug was initially created as a clone of Bug #1409191 +++

Please provide a public description of the problem.

--- Additional comment from Ashish Pandey on 2017-01-09 08:33:43 CET ---

Description of problem:
------------------------

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set.

It looks like we have regressed with 3.2 on large file writes/rand writes :

******************
Sequential Writes
******************

3.1.3 : 2838601.16 kB/sec
3.2   : 2506687.55 kB/sec

Regression : ~ 12%

******************
Random Writes
******************

3.1.3 : 617384.17 kB/sec
3.2   : 480226.17 kB/sec

Regression : ~22%

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-10.el7rhgs.x86_64

How reproducible:
------------------

100%

Actual results:
----------------

Regressions on sequential and random large file writes.

Expected results:
-----------------

Regression Threshold is within +-10%

--- Additional comment from Worker Ant on 2017-01-11 18:33:30 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good
file while IO is going on) posted (#1) for review on master by Ashish Pandey
(aspandey at redhat.com)

--- Additional comment from Ashish Pandey on 2017-01-11 18:48:44 CET ---

Just missed to mention this info 

Possible RCA -  

After implementing patch http://review.gluster.org/#/c/13733/,
before writing on a file we set dirty flag and at the end we remove this flag.
This creates an index entry in .glusterfs/indices/xattrop/ .
which remains there through out write fop. every 60 seconds shd will come up
and scan this entry and starts heal, Heal in turn takes a lot of locks to FIND
and heal the file. 

Which raises the number of inodelk fop count and could be a possible culprit.

I disabled the shd and  wrote a file -

time dd if=/dev/urandom of=a1 count=1024 bs=1M conv=fdatasync
Profile shows only 4 calls inodelk.

Brick: apandey:/brick/gluster/testvol-6
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8188                     2 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      47.00 us      47.00 us      47.00 us              1      STATFS
      0.00      49.50 us      46.00 us      53.00 us              2       FLUSH
      0.00      38.00 us      26.00 us      52.00 us              4     INODELK
      0.00      92.50 us      85.00 us     100.00 us              2     XATTROP
      0.00     305.00 us     305.00 us     305.00 us              1      CREATE
      0.00     138.00 us      32.00 us     395.00 us              4    FXATTROP
      0.00     164.14 us     119.00 us     212.00 us              7      LOOKUP
      0.92      72.73 us      43.00 us    8431.00 us           8190       WRITE
     99.08 64142355.00 us 64142355.00 us 64142355.00 us              1      
FSYNC

With shd enable it is around 54- 

Brick: apandey:/brick/gluster/testvol-1
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8190                     1 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              7     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             21  RELEASEDIR
      0.00      30.00 us      30.00 us      30.00 us              1      STATFS
      0.00       5.76 us       2.00 us       9.00 us             21     OPENDIR
      0.00      64.50 us      30.00 us      99.00 us              2       FLUSH
      0.00      23.17 us      20.00 us      27.00 us              6       FSTAT
      0.00      95.50 us      89.00 us     102.00 us              2     XATTROP
      0.00     272.00 us     272.00 us     272.00 us              1      CREATE
      0.00      61.67 us      42.00 us      85.00 us              6        OPEN
      0.00      98.94 us      31.00 us     428.00 us             16    FXATTROP
      0.00      79.92 us      22.00 us     190.00 us             38      LOOKUP
      0.12    2379.48 us    1376.00 us    4600.00 us             42     READDIR
      0.74      74.70 us      42.00 us   49556.00 us           8191       WRITE
     10.29  163490.19 us      19.00 us 1405941.00 us             52     INODELK
     19.02  320668.04 us      26.00 us 15705174.00 us             49   
GETXATTR
     69.83 57700430.00 us 57700430.00 us 57700430.00 us              1      
FSY

--- Additional comment from Worker Ant on 2017-01-16 09:25:05 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good
file while IO is going on) posted (#2) for review on master by Ashish Pandey
(aspandey at redhat.com)

--- Additional comment from Worker Ant on 2017-01-19 13:56:35 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good
file while IO is going on) posted (#3) for review on master by Ashish Pandey
(aspandey at redhat.com)

--- Additional comment from Worker Ant on 2017-01-20 13:29:39 CET ---

COMMIT: http://review.gluster.org/16377 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 578e9b5b5b45245ed044bab066533411e2141db6
Author: Ashish Pandey <aspandey at redhat.com>
Date:   Wed Jan 11 17:19:30 2017 +0530

    cluster/ec: Do not start heal on good file while IO is going on

    Problem:
    Write on a file has been slowed down significantly after
    http://review.gluster.org/#/c/13733/

    RC : When update fop starts on a file, it sets dirty flag at
    the start and remove it at the end which make an index entry
    in indices/xattrop. During IO, SHD scans this and finds out
    an index and starts heal even if all the fragments are healthy
    and up tp date. This heal takes inodelk for different types of
    heal. If the IO is for long time this will happen in every 60 seconds.
    Due to this extra, unneccessary locking, IO gets slowed down.

    Solution:
    Before starting  any  type of heal check if file needs heal or not.

    Change-Id: Ib9519a43e7e4b2565d3f3153f9ca0fb92174fe51
    BUG: 1409191
    Signed-off-by: Ashish Pandey <aspandey at redhat.com>
    Reviewed-on: http://review.gluster.org/16377
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>

--- Additional comment from Worker Ant on 2017-01-20 13:59:00 CET ---

REVIEW: http://review.gluster.org/16444 (cluster/ec: Do not start heal on good
file while IO is going on) posted (#1) for review on release-3.9 by Ashish
Pandey (aspandey at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1408639
[Bug 1408639] [Perf] : Sequential  Writes are off target by 12% on EC
backed volumes over FUSE
https://bugzilla.redhat.com/show_bug.cgi?id=1409191
[Bug 1409191] Sequential and Random Writes are off target by 12% and 22%
respectively on EC backed volumes over FUSE
https://bugzilla.redhat.com/show_bug.cgi?id=1415160
[Bug 1415160] Sequential and Random Writes are off target by 12% and 22%
respectively on EC backed volumes over FUSE
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.