[Bugs] [Bug 1316327] New: Upgrade from 3.7.6 to 3.7.8 causes massive drop in write performance. Fresh install of 3.7.8 also has low write performance

bugzilla at redhat.com bugzilla at redhat.com
Thu Mar 10 00:26:00 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1316327

            Bug ID: 1316327
           Summary: Upgrade from 3.7.6 to 3.7.8 causes massive drop in
                    write performance.  Fresh install of 3.7.8 also has
                    low write performance
           Product: GlusterFS
           Version: mainline
         Component: fuse
          Keywords: Triaged
          Severity: low
          Assignee: bugs at gluster.org
          Reporter: vbellur at redhat.com
                CC: bugs at gluster.org, dfrobins at yahoo.com,
                    hostingnuggets at gmail.com, jeremy at rosengren.org,
                    mjc at avtechpulse.com, oleksandr at natalenko.name,
                    pgurusid at redhat.com, ravishankar at redhat.com,
                    rcyriac at redhat.com, rgowdapp at redhat.com,
                    sankarshan at redhat.com, skoduri at redhat.com,
                    tavis at eventbase.com, vpvainio at iki.fi,
                    xhernandez at datalab.es
        Depends On: 1309462
            Blocks: 1309567 (glusterfs-3.7.9)



+++ This bug was initially created as a clone of Bug #1309462 +++

Description of problem:
We have several clusters running in a simple configuration, 3 servers with 1
brick each in Replicate mode (1x3)
After upgrading from 3.7.6 to 3.7.8 (which fixed many memory leaks, thanks!)
our write performance dropped to almost nothing.  Where we would get
60-100mB/sec we are now getting 1-4mB/sec

This seems to happen when using the Gluster fuse filesystem, if i mount the
volume as NFS it seems to work correctly.  Unfortunately we have experienced
stability using NFS in our environment so i cannot use this as a work around

Version-Release number of selected component (if applicable):
3.7.8

How reproducible:
I have created, from scratch, two seperate three node systems (yay for
automation.) and installed/created the gluster volume.

As well i have three other clusters that were upgraded (Softlayer, Online-Tech
and Azure) which are exhibiting the same problem


Steps to Reproduce:
1. Provision deploy three servers and create a gluster volume with, "gluster
volume create VOLUME_NAME replica 3 transport tcp $NODES"

Actual results:
Incredible unexplained poor write performance (read performance is okay)

Expected results:
Reasonable write performance

Additional info:
Let me know if you would like any additional information from my environment

--- Additional comment from Tavis Paquette on 2016-02-17 16:48:49 EST ---

It was suggested to disable write-behind by hagarth in the #gluster freenode
IRC channel

gluster volume set VOLUME performance.write-behind Off


This seemed to bring the write performance of the volume back to normal levels

--- Additional comment from Oleksandr Natalenko on 2016-02-18 06:44:43 EST ---

I confirm the issue with 3.7.8 client working with replicated volume. However,
if the volume is not replicated, the issue does not rise.

Testing freshly created pure distributed volume shows 60–90 MB/s. Adding brick
for replica 2 with add-brick lowers throughput to 1–6 MB/s. Removing replica
brick with remove-brick brings throughput to 60–90 MB/s again.

--- Additional comment from Oleksandr Natalenko on 2016-02-22 05:00:48 EST ---

Testing environment: 3.7.6 server, 3.7.6 client, 3.7.8 client.
Benchmark: dd if=/dev/zero of=bwtest bs=1M count=64

=== replica 2, performance.write-behind on, storage.linux-aio off ===

3.7.6 client: 56.5 MB/s
3.7.8 client: 54.4 MB/s

=== replica 2, performance.write-behind on, storage.linux-aio on ===

3.7.6 client: 57.3 MB/s
3.7.8 client: 6.7 MB/s

=== replica 2, performance.write-behind off, storage.linux-aio on ===

3.7.6 client: 27.1 MB/s
3.7.8 client: 27.5 MB/s

=== replica 2, performance.write-behind off, storage.linux-aio off ===

3.7.6 client: 40.3 MB/s
3.7.8 client: 41.5 MB/s

--- Additional comment from Oleksandr Natalenko on 2016-02-22 10:28:23 EST ---

Also, while copying files with Midnight Commander, 3.7.8 client is always slow
regardless of options set/unset.

--- Additional comment from  on 2016-02-22 14:10:11 EST ---

I am also seeing a severe performance hit with 3.7.8.  See email to user and
devel email lists below.  Note, that setting "performance.write-behind off" did
not change my results:

The 3.7.8 FUSE client is significantly slower than 3.7.6.  Is this related to
some of the fixes that were done to correct memory leaks?  Is there anything
that I can do to recover the performance of 3.7.6?

My testing involved creating a "bigfile" that is 20GB.  I then installed the
3.6.6 FUSE client and tested the copy of the bigfile from one gluster machine
to another.  The test was repeated 2x to make sure cache wasn't affect
performance.

Using Centos7.1
FUSE 3.6.6 took 47-seconds and 38-seconds.
FUSE 3.7.6 took 43-seconds and 34-seconds.
FUSE 3.7.8 took 205-seconds and 224-seconds

I repeated the test on another machine that is running centos 6.7 and the
results were even worse.  98-seconds for FUSE 3.6.6 versus 575-seconds for FUSE
3.7.8.

My server setup is:

Volume Name: gfsbackup
Type: Distribute
Volume ID: 29b8fae9-dfbf-4fa4-9837-8059a310669a
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: ffib01bkp:/data/brick01/gfsbackup
Brick2: ffib01bkp:/data/brick02/gfsbackup
Options Reconfigured:
performance.readdir-ahead: on
cluster.rebal-throttle: aggressive
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
changelog.changelog: off
client.event-threads: 8
server.event-threads: 8

--- Additional comment from Soumya Koduri on 2016-02-22 21:24:16 EST ---

@David,

Could you check if turning off performance.write-behind improves the
write-performance?

--- Additional comment from  on 2016-02-22 22:28:25 EST ---

(In reply to Soumya Koduri from comment #6)
> @David,
> 
> Could you check if turning off performance.write-behind improves the
> write-performance?

It didn't have any affect. Let me know if you want me to try anything else. 

David

--- Additional comment from Soumya Koduri on 2016-02-23 01:12:30 EST ---

CCin Ravishankar & Poornima who have been actively looking at it.

--- Additional comment from Ravishankar N on 2016-02-23 07:37:34 EST ---

So it looks like there are different parts to the perf degradation between
3.7.6 and 3.7.8:

1. http://review.gluster.org/12953 went in for 3.7.7. With this fuse patch,
before every writev(), fuse sends a getxattr  for 'security.capability' to the
bricks. That is an extra FOP for every writev, which was not there in 3.7.6
where fuse returns the call with ENODATA without winding it to the bricks.

2.Now when the brick returns ENODATA for the getxattr, AFR does an inode
refresh which again triggers lookups and also seems to trigger data selfheal
(need to figure out why). It also goes ahead to wind the getxattr to the other
bricks of the replica. (all of them fail with ENODATA). All these are extra
FOPS which add to the latency.

Potential code fixes needed:
a) We need to figure out if 12953 can be reverted/ modified or do we explicty
want to wind the getxattr for security.capability. Poornima will work with
Michael for that.

b)In AFR if getxattr fails with ENODATA, we do not need to wind it to other
replica bricks (which would anyway fail with ENODATA).

c) Currently, before winding to the getxattr to other bricks, any triggered
self-heal will have to be completed. This is anyway going to be fixed with
http://review.gluster.org/#/c/13207/ where all client selfheals will be run in
background instead of blocking the FOPS.

--- Additional comment from Ravishankar N on 2016-02-23 07:40:36 EST ---

While there is no workaround in 3.7.8 for not winding the getxattr, disabling
client side heals (data selfheal in particular) seems to improve things a bit
in our setup:

#gluster volume set <VOLNAME> data-self-heal off

Tavis, Oleksandr, David - could you try if the above command improves the
performance?

--- Additional comment from Oleksandr Natalenko on 2016-02-23 08:59:24 EST ---

data-self-heal on: 1 MB/s
data-self-heal off: 11 MB/s

--- Additional comment from Tavis Paquette on 2016-02-23 12:48:58 EST ---

"data-self-heal off" doesn't have any effect on throughput for me

--- Additional comment from Oleksandr Natalenko on 2016-02-23 12:50:02 EST ---

Tavis, have you restarted and remounted the volume?

--- Additional comment from  on 2016-02-23 12:53:24 EST ---

data-self-heal did not change the results for me.  Both took a little over
3-minutes when using 3.7.8 and 30-45 seconds when using 3.7.6.

David

--- Additional comment from Tavis Paquette on 2016-02-23 16:01:05 EST ---

I un-mounted, stopped, disabled self-heal, started and remounted
No effect on throughput for my test cluster

--- Additional comment from  on 2016-02-23 16:24:10 EST ---

(In reply to Tavis Paquette from comment #15)
> I un-mounted, stopped, disabled self-heal, started and remounted
> No effect on throughput for my test cluster

Same here.  un-mounted/stop/disable/start/remount had no effect for my test
case.

--- Additional comment from Poornima G on 2016-02-24 06:09:57 EST ---

If the performance didn't improve with the workaround suggested, then can you
provide us with the profile info of the volume, for the I/O workload you are
testing. Volume profile can be taken using the following command:

1. gluster vol profile <vol-name> start
2. IO from the mount point
/* server profile info */
3. gluster vol profile <vol-name> info
/* client profile info */
4. setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt /your/mountpoint
5. gluster vol profile <vol-name> stop

--- Additional comment from Tavis Paquette on 2016-02-24 12:42:13 EST ---

I have the output from the profile, however this contains private information
that i would not like released publicly

Can i mail it to you privately at pgurusid at redhat.com?

--- Additional comment from Poornima G on 2016-02-25 00:20:28 EST ---

Sure you could mail it me. Also you could remove the private information like
the hostname, brick names, file names etc. from the output, as the information
that we need would be:
1. In server profile:
   The table of fops and latency for each brick
   Which bricks are the replicas

2. In client profile:
   The table of fops and latency

3. The exact workload, as in how many file create/write of what block sizes
etc.

--- Additional comment from Tavis Paquette on 2016-02-26 13:15:49 EST ---

Sent

--- Additional comment from Vijay Bellur on 2016-02-29 00:58:27 EST ---

REVIEW: http://review.gluster.org/13540 (fuse: forbid access to
security.selinux and security.capability xattr       if not mounted with
'selinux') posted (#1) for review on master by Poornima G (pgurusid at redhat.com)

--- Additional comment from Oleksandr Natalenko on 2016-02-29 03:09:12 EST ---

@Vijay Bellur

13540 definitely makes things better for me, yielding up to 50 MB/s.

--- Additional comment from Poornima G on 2016-02-29 07:59:54 EST ---

Oleksandr Natalenko, oh good to here that, thank you.

Tavis Paquette, looked at the profile info, doesn't seem to be the case we
fixed. In the case we fixed, the INODELK fop was consuming 99% of the time on
the brick side. That doesn't seem to be the case in the profile info you sent.
I see a lot of GETXATTR calls, but these shouldn't be reducing the perf
drastically as you have mentioned.

Also, was this with write-behind off or on?

So we could try 2 things:
1. If you generate the profile info as mentioned earlier with 3.7.6 client, for
the same workload, we will be able to compare the fops and latency.

2. http://review.gluster.org/13540 reduces the number of getxattrs which will
be called. This will increase the perf, may not be by multiple folds. If you
have a test system could try this, this patch will anyways will be part of
3.7.9 release.

--- Additional comment from Vijay Bellur on 2016-03-01 00:46:44 EST ---

REVIEW: http://review.gluster.org/13540 (fuse: forbid access to
security.selinux and security.capability xattr       if not mounted with
'selinux') posted (#2) for review on master by Poornima G (pgurusid at redhat.com)

--- Additional comment from Tavis Paquette on 2016-03-01 00:53:46 EST ---

write-behind was enabled in the scenario where i was experiencing low write
performance

turning write-behind off seemed to bring my write throughput back to 3.7.6
levels
the profile was generated on a cluster that had write-behind disabled

i'll try to find time to generate a profile with 3.7.6 in the same
configuration, as well as 3.7.8 with write-behind enabled and then 3.7.8 with
write-behind disabled

it may take me a few days though, i'll follow up as soon as i can

--- Additional comment from Vijay Bellur on 2016-03-01 06:20:40 EST ---

REVIEW: http://review.gluster.org/13540 (fuse: forbid access to
security.selinux and security.capability xattr       if not mounted with
'selinux') posted (#3) for review on master by Poornima G (pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-01 08:25:00 EST ---

REVIEW: http://review.gluster.org/13567 (distribute/tests: Use a different
mount instead of reusing a Mount.) posted (#1) for review on master by
Raghavendra G (rgowdapp at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-03 11:55:18 EST ---

REVIEW: http://review.gluster.org/13540 (fuse: Add a new mount option
capability) posted (#4) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-03 13:00:17 EST ---

REVIEW: http://review.gluster.org/13595 (afr: misc performance improvements)
posted (#1) for review on master by Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Ravishankar N on 2016-03-04 00:11:55 EST ---

Note: 13595 aims to address problem #2 in comment#5

--- Additional comment from Ravishankar N on 2016-03-04 00:13:06 EST ---

(In reply to Ravishankar N from comment #30)
> Note: 13595 aims to address problem #2 in comment#5

Ugh. I meant comment#9 :-/

--- Additional comment from Vijay Bellur on 2016-03-07 10:27:33 EST ---

REVIEW: http://review.gluster.org/13595 (afr: misc performance improvements)
posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-07 10:48:30 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#1) for review on release-3.7 by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 00:57:33 EST ---

COMMIT: http://review.gluster.org/13540 committed in master by Raghavendra G
(rgowdapp at redhat.com) 
------
commit 5b5f03d2665687ab717f123da1266bcd3a83da0f
Author: Poornima G <pgurusid at redhat.com>
Date:   Fri Feb 26 06:42:14 2016 -0500

    fuse: Add a new mount option capability

    Originally all security.* xattrs were forbidden if selinux is disabled,
    which was causing Samba's acl_xattr module to not work, as it would
    store the NTACL in security.NTACL. To fix this
http://review.gluster.org/#/c/12826/
    was sent, which forbid only security.selinux. This opened up a getxattr
    call on security.capability before every write fop and others.

    Capabilities can be used without selinux, hence if selinux is disabled,
    security.capability cannot be forbidden. Hence adding a new mount
    option called capability.

    Only when "--capability" or "--selinux" mount option is used,
    security.capability is sent to the brick, else it is forbidden.

    Change-Id: I77f60e0fb541deaa416159e45c78dd2ae653105e
    BUG: 1309462
    Signed-off-by: Poornima G <pgurusid at redhat.com>
    Reviewed-on: http://review.gluster.org/13540
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

--- Additional comment from Vijay Bellur on 2016-03-08 06:16:16 EST ---

REVIEW: http://review.gluster.org/13644 (afr: misc performance improvements)
posted (#1) for review on release-3.7 by Ravishankar N (ravishankar at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 09:13:52 EST ---

COMMIT: http://review.gluster.org/13595 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit d1d364634dce0c3dcfe9c2efc883c21af0494d0d
Author: Ravishankar N <ravishankar at redhat.com>
Date:   Thu Mar 3 23:17:17 2016 +0530

    afr: misc performance improvements

    1. In afr_getxattr_cbk, consider the errno value before blindly
    launching an inode refresh and a subsequent retry on other children.

    2. We want to accuse small files only when we know for sure that there is
no
    IO happening on that inode. Otherwise, the ia_sizes obtained in the
    post-inode-refresh replies may mismatch due to a race between
    inode-refresh and ongoing writes, causing spurious heal launches.

    Change-Id: Ife180f4fa5e584808c1077aacdc2423897675d33
    BUG: 1309462
    Signed-off-by: Ravishankar N <ravishankar at redhat.com>
    Reviewed-on: http://review.gluster.org/13595
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>

--- Additional comment from Vijay Bellur on 2016-03-08 21:16:38 EST ---

COMMIT: http://review.gluster.org/13644 committed in release-3.7 by Vijay
Bellur (vbellur at redhat.com) 
------
commit e9fa7aeb1a32e22ff0749d67995e689028ca5a19
Author: Ravishankar N <ravishankar at redhat.com>
Date:   Tue Mar 8 16:43:12 2016 +0530

    afr: misc performance improvements

    Backport of http://review.gluster.org/#/c/13595/
    1. In afr_getxattr_cbk, consider the errno value before blindly
    launching an inode refresh and a subsequent retry on other children.

    2. We want to accuse small files only when we know for sure that there
    is no
    IO happening on that inode. Otherwise, the ia_sizes obtained in the
    post-inode-refresh replies may mismatch due to a race between
    inode-refresh and ongoing writes, causing spurious heal launches.

    Change-Id: I9858485d1061db67353ccf99c59530731649c847
    BUG: 1309462
    Signed-off-by: Ravishankar N <ravishankar at redhat.com>
    Reviewed-on: http://review.gluster.org/13644
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>

--- Additional comment from Vijay Bellur on 2016-03-09 03:58:56 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#2) for review on release-3.7 by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 04:06:33 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#3) for review on release-3.7 by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 04:23:19 EST ---

REVIEW: http://review.gluster.org/13653 (fuse: Address the review comments in
the backport) posted (#1) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 05:53:57 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#4) for review on release-3.7 by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 07:05:59 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#5) for review on release-3.7 by Vijay Bellur
(vbellur at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 15:03:26 EST ---

REVIEW: http://review.gluster.org/13626 (fuse: Add a new mount option
capability) posted (#6) for review on release-3.7 by Vijay Bellur
(vbellur at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 15:09:08 EST ---

REVIEW: http://review.gluster.org/13653 (fuse: Address the review comments in
the backport) posted (#2) for review on master by Vijay Bellur
(vbellur at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 15:32:43 EST ---

REVIEW: http://review.gluster.org/13653 (fuse: Address the review comments in
the backport) posted (#3) for review on master by Vijay Bellur
(vbellur at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1309462
[Bug 1309462] Upgrade from 3.7.6 to 3.7.8 causes massive drop in write
performance.  Fresh install of 3.7.8 also has low write performance
https://bugzilla.redhat.com/show_bug.cgi?id=1309567
[Bug 1309567] Tracker for glusterfs-3.7.9
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list