[Bugs] [Bug 1460899] New: [Negative Lookup]: negative lookup features doesn' t seem to work on restart of volume

Tue Jun 13 05:13:39 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1460899

            Bug ID: 1460899
           Summary: [Negative Lookup]: negative lookup features doesn't
                    seem to work on restart of volume
           Product: GlusterFS
           Version: 3.11
         Component: gluster-smb
          Assignee: bugs at gluster.org
          Reporter: pgurusid at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    nchilaka at redhat.com, rhinduja at redhat.com,
                    rhs-smb at redhat.com, sbhaloth at redhat.com
        Depends On: 1450080, 1458539

+++ This bug was initially created as a clone of Bug #1458539 +++

+++ This bug was initially created as a clone of Bug #1450080 +++

Description of problem:
========================
Seems like if we restart a volume , the negative lookup caching feature is
lost.
I did a lookup of a non existing file with nl caching enabled
generally the first lookup takes about 7-13 lookup FOPs depending on the dht
hash/unhashed brick
On a second lookup, I notice that the lookup FOP generally  seemed to take
about 1 lookup FOP on each brick

I then did a volume restart and reran above checks and found that even on
repeatitive  negative lookups, the lookup FOPs take the same as without nl
cache enabled

I saw the trusted vol file and didn't note any descrepencies, as the nl cache
translators got loaded

Version-Release number of selected component (if applicable):
======
[root at dhcp35-192 negative]# rpm -qa|grep gluster
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
python-gluster-3.8.4-24.el7rhgs.noarch
glusterfs-libs-3.8.4-24.el7rhgs.x86_64
glusterfs-fuse-3.8.4-24.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-24.el7rhgs.x86_64
glusterfs-api-3.8.4-24.el7rhgs.x86_64
glusterfs-cli-3.8.4-24.el7rhgs.x86_64
glusterfs-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-24.el7rhgs.x86_64
glusterfs-rdma-3.8.4-24.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-24.el7rhgs.x86_64
glusterfs-server-3.8.4-24.el7rhgs.x86_64
glusterfs-events-3.8.4-24.el7rhgs.x86_64
[root at dhcp35-192 negative]# rpm -qa|grep samba
samba-common-4.6.3-0.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
samba-common-tools-4.6.3-0.el7rhgs.x86_64
samba-common-libs-4.6.3-0.el7rhgs.x86_64
samba-4.6.3-0.el7rhgs.x86_64
samba-client-4.6.3-0.el7rhgs.x86_64
samba-libs-4.6.3-0.el7rhgs.x86_64
samba-client-libs-4.6.3-0.el7rhgs.x86_64
[root at dhcp35-192 negative]# 

How reproducible:
===============
consistent on my setup

Steps to Reproduce:
1.had a 3 node cluster with samba setup using
https://mojo.redhat.com/docs/DOC-1027985(no ctdb setup)
2.created a 2x2 volume as below ===>vol bricks are part of n2 and n3 nodes,
while n1 was the samba server node
[root at dhcp35-192 ~]# gluster v info

Volume Name: negative
Type: Distributed-Replicate
Volume ID: 48c809a8-8a82-4806-aaf0-5ea0bbf423cb
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: :/rhs/brick1/negative
Brick2: :/rhs/brick1/negative
Brick3: :/rhs/brick2/negative
Brick4: :/rhs/brick2/negative
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation-timeout: 3000
features.cache-invalidation: on
performance.nl-cache: on
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
storage.batch-fsync-delay-usec: 0

VOLUME PROFILE IS STARTED
4. mounted using cifs on a rhel client
5.did some amount of testing  wrt nl caching(for about half a day)
6. created a directory dir4 
7. did a negative lookup by issuing ls x2 
says file not existing as expected 
the profiling mentioned about 7-13 fops
8. again did a ls x2 to see nl cache working
fop redcued to about 1 per brick

9. restarted vol
10. repeated 7,8 steps again and again, but the lookup fop count is not
reducing

before restart

first negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      9.69      60.00 us      60.00 us      60.00 us              1    GETXATTR
     90.31     111.80 us      82.00 us     143.00 us              5      LOOKUP

    Duration: 4 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
     12.04     124.00 us     124.00 us     124.00 us              1    GETXATTR
     87.96     113.25 us      63.00 us     252.00 us              8      LOOKUP

    Duration: 4 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      9.87      70.00 us      70.00 us      70.00 us              1    GETXATTR
     90.13     127.80 us     100.00 us     152.00 us              5      LOOKUP

    Duration: 5 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
     12.18     147.00 us     147.00 us     147.00 us              1    GETXATTR
     87.82     132.50 us      79.00 us     299.00 us              8      LOOKUP

    Duration: 5 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

second negative lookup 
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     336.00 us     336.00 us     336.00 us              1      LOOKUP

    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     338.00 us     338.00 us     338.00 us              1      LOOKUP

    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     144.00 us     144.00 us     144.00 us              1      LOOKUP

    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     263.00 us     263.00 us     263.00 us              1      LOOKUP

    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

[root at dhcp35-192 ~]# 

#############################
on restart

first negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     121.29 us      77.00 us     251.00 us              7      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     116.08 us      53.00 us     446.00 us             13      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     125.92 us      69.00 us     477.00 us             13      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     113.86 us      74.00 us     249.00 us              7      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

second negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     121.29 us      77.00 us     251.00 us              7      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     116.08 us      53.00 us     446.00 us             13      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     125.92 us      69.00 us     477.00 us             13      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     113.86 us      74.00 us     249.00 us              7      LOOKUP

    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

Workaround:
========
post restart turn off and again turn on nl cache option

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-11
09:51:31 EDT ---

This bug is automatically being proposed for the current release of Red Hat
Gluster Storage 3 under active development, by setting the release flag
'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Poornima G on 2017-05-12 08:50:26 EDT ---

RCA:

The cache is stored in the directories inode_ctx, and the time of cache is set
to the time inode_ctx was created, and is not updated as and when new cache
entries are added to the inode_ctx(as it is cumbersome to have timeout for each
entry, and updating global time will result in stale cache of older entries).
Consider a sequence of operation as below:

At T1, lookup dir1/f1 (f1 doesn't exist)
       dir1 inode_ctx->cache_time = T1
            cache = "f1"
       lookup dir f2 (f2 doesn't exist)
       dir1 inode_ctx->cache_time remains unchanged
            cache = "f1", "f2"

Child_DOWN / all /some of the bricks down at T2
       thus anything cached before T2 is considered invalid but is not cleared
from memory.

At T3, lookup dir1/f1
       dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
       lookup is sent to the brick.

At T4, lookup dir1/f3
       dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
       lookup is sent to the brick.

Thus the cache of dir1 is invalid until T1+10 min(cache timeout), and also for
the 10 min no new cache is added to the dir1.

Consequence:
After any brick down/up, the cache of all the existing directories is invalid
and cannot be corrected for max 10 min. Thus after brick down/up the nl-cache
doesn't work for the directories that already had the cache, for the max of 10
min.

Solution:
If the inoce_ctx->cachetime < brick_down time, then delete the existing cache
and start populating fresh cache.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15
05:48:25 EDT ---

This bug is automatically being provided 'pm_ack+' for the release flag
'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active
development, having been appropriately marked for the release, and having been
provided ACK from Development and QE

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15
06:03:05 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster
Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal
Whiteboard entry of '3.3.0', the Target Release is being automatically set to
'RHGS 3.3.0'

--- Additional comment from Worker Ant on 2017-06-04 01:36:22 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#1) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Worker Ant on 2017-06-08 01:24:49 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#2) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Worker Ant on 2017-06-09 01:33:27 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#3) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Worker Ant on 2017-06-12 01:03:12 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#4) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Worker Ant on 2017-06-12 01:14:44 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#5) for review on master by Poornima G
(pgurusid at redhat.com)

--- Additional comment from Worker Ant on 2017-06-13 01:01:21 EDT ---

COMMIT: https://review.gluster.org/17453 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit 7674584fa53944a4e982e217798f31a3d1ef313b
Author: Poornima G <pgurusid at redhat.com>
Date:   Fri May 26 15:45:57 2017 +0530

    nl-cache: Fix a possible crash and stale cache

    Issue1:
    Consider the followinf sequence of operations:
       ...
       nlc_ctx = nlc_ctx_get (inode i1)
       ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate
                                       or any other callers
                  ...
                  GF_FREE (ii nlc_ctx)
       LOCK (nlc_ctx->lock);  ->  This will result in crash as the ctx
                                  got freed in nlc_clear_cache.

    Issue2:
       lookup on dir1/file1 result in ENOENT
       add cache to dir1 at time T1
       ....
       CHILD_DOWN at T2
       lookup on dir1/file2 result in ENOENT
       add cache to dir1, but the cache time is still T1
       lookup on dir1/file2 - should have been served from cache
                              but the cache time is T1 < T2, hence
                              cache is considered as invalid.
    So, after CHILD_DOWN the right thing would be to clear the cache
    and restart caching on that inode.

    Solution:
    Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget()
    The fix for both issue1 and 2 is interleaved hence sending it as
    single patch.

    Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397
    BUG: 1458539
    Signed-off-by: Poornima G <pgurusid at redhat.com>
    Reviewed-on: https://review.gluster.org/17453
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1450080
[Bug 1450080] [Negative Lookup]: negative lookup features doesn't seem to
work on restart of volume
https://bugzilla.redhat.com/show_bug.cgi?id=1458539
[Bug 1458539] [Negative Lookup]: negative lookup features doesn't seem to
work on restart of volume
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.