[Bugs] [Bug 1460899] New: [Negative Lookup]: negative lookup features doesn' t seem to work on restart of volume
bugzilla at redhat.com
bugzilla at redhat.com
Tue Jun 13 05:13:39 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1460899
Bug ID: 1460899
Summary: [Negative Lookup]: negative lookup features doesn't
seem to work on restart of volume
Product: GlusterFS
Version: 3.11
Component: gluster-smb
Assignee: bugs at gluster.org
Reporter: pgurusid at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
nchilaka at redhat.com, rhinduja at redhat.com,
rhs-smb at redhat.com, sbhaloth at redhat.com
Depends On: 1450080, 1458539
+++ This bug was initially created as a clone of Bug #1458539 +++
+++ This bug was initially created as a clone of Bug #1450080 +++
Description of problem:
========================
Seems like if we restart a volume , the negative lookup caching feature is
lost.
I did a lookup of a non existing file with nl caching enabled
generally the first lookup takes about 7-13 lookup FOPs depending on the dht
hash/unhashed brick
On a second lookup, I notice that the lookup FOP generally seemed to take
about 1 lookup FOP on each brick
I then did a volume restart and reran above checks and found that even on
repeatitive negative lookups, the lookup FOPs take the same as without nl
cache enabled
I saw the trusted vol file and didn't note any descrepencies, as the nl cache
translators got loaded
Version-Release number of selected component (if applicable):
======
[root at dhcp35-192 negative]# rpm -qa|grep gluster
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
python-gluster-3.8.4-24.el7rhgs.noarch
glusterfs-libs-3.8.4-24.el7rhgs.x86_64
glusterfs-fuse-3.8.4-24.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-24.el7rhgs.x86_64
glusterfs-api-3.8.4-24.el7rhgs.x86_64
glusterfs-cli-3.8.4-24.el7rhgs.x86_64
glusterfs-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-24.el7rhgs.x86_64
glusterfs-rdma-3.8.4-24.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-24.el7rhgs.x86_64
glusterfs-server-3.8.4-24.el7rhgs.x86_64
glusterfs-events-3.8.4-24.el7rhgs.x86_64
[root at dhcp35-192 negative]# rpm -qa|grep samba
samba-common-4.6.3-0.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
samba-common-tools-4.6.3-0.el7rhgs.x86_64
samba-common-libs-4.6.3-0.el7rhgs.x86_64
samba-4.6.3-0.el7rhgs.x86_64
samba-client-4.6.3-0.el7rhgs.x86_64
samba-libs-4.6.3-0.el7rhgs.x86_64
samba-client-libs-4.6.3-0.el7rhgs.x86_64
[root at dhcp35-192 negative]#
How reproducible:
===============
consistent on my setup
Steps to Reproduce:
1.had a 3 node cluster with samba setup using
https://mojo.redhat.com/docs/DOC-1027985(no ctdb setup)
2.created a 2x2 volume as below ===>vol bricks are part of n2 and n3 nodes,
while n1 was the samba server node
[root at dhcp35-192 ~]# gluster v info
Volume Name: negative
Type: Distributed-Replicate
Volume ID: 48c809a8-8a82-4806-aaf0-5ea0bbf423cb
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: :/rhs/brick1/negative
Brick2: :/rhs/brick1/negative
Brick3: :/rhs/brick2/negative
Brick4: :/rhs/brick2/negative
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation-timeout: 3000
features.cache-invalidation: on
performance.nl-cache: on
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
storage.batch-fsync-delay-usec: 0
VOLUME PROFILE IS STARTED
4. mounted using cifs on a rhel client
5.did some amount of testing wrt nl caching(for about half a day)
6. created a directory dir4
7. did a negative lookup by issuing ls x2
says file not existing as expected
the profiling mentioned about 7-13 fops
8. again did a ls x2 to see nl cache working
fop redcued to about 1 per brick
9. restarted vol
10. repeated 7,8 steps again and again, but the lookup fop count is not
reducing
before restart
first negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
9.69 60.00 us 60.00 us 60.00 us 1 GETXATTR
90.31 111.80 us 82.00 us 143.00 us 5 LOOKUP
Duration: 4 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
12.04 124.00 us 124.00 us 124.00 us 1 GETXATTR
87.96 113.25 us 63.00 us 252.00 us 8 LOOKUP
Duration: 4 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
9.87 70.00 us 70.00 us 70.00 us 1 GETXATTR
90.13 127.80 us 100.00 us 152.00 us 5 LOOKUP
Duration: 5 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
12.18 147.00 us 147.00 us 147.00 us 1 GETXATTR
87.82 132.50 us 79.00 us 299.00 us 8 LOOKUP
Duration: 5 seconds
Data Read: 0 bytes
Data Written: 0 bytes
second negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 336.00 us 336.00 us 336.00 us 1 LOOKUP
Duration: 6 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 338.00 us 338.00 us 338.00 us 1 LOOKUP
Duration: 6 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 144.00 us 144.00 us 144.00 us 1 LOOKUP
Duration: 6 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 263.00 us 263.00 us 263.00 us 1 LOOKUP
Duration: 6 seconds
Data Read: 0 bytes
Data Written: 0 bytes
[root at dhcp35-192 ~]#
#############################
on restart
first negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 121.29 us 77.00 us 251.00 us 7 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 116.08 us 53.00 us 446.00 us 13 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 125.92 us 69.00 us 477.00 us 13 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 113.86 us 74.00 us 249.00 us 7 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
second negative lookup
[root at dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 121.29 us 77.00 us 251.00 us 7 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 116.08 us 53.00 us 446.00 us 13 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 125.92 us 69.00 us 477.00 us 13 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
100.00 113.86 us 74.00 us 249.00 us 7 LOOKUP
Duration: 8 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Workaround:
========
post restart turn off and again turn on nl cache option
--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-11
09:51:31 EDT ---
This bug is automatically being proposed for the current release of Red Hat
Gluster Storage 3 under active development, by setting the release flag
'rhgs‑3.3.0' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from Poornima G on 2017-05-12 08:50:26 EDT ---
RCA:
The cache is stored in the directories inode_ctx, and the time of cache is set
to the time inode_ctx was created, and is not updated as and when new cache
entries are added to the inode_ctx(as it is cumbersome to have timeout for each
entry, and updating global time will result in stale cache of older entries).
Consider a sequence of operation as below:
At T1, lookup dir1/f1 (f1 doesn't exist)
dir1 inode_ctx->cache_time = T1
cache = "f1"
lookup dir f2 (f2 doesn't exist)
dir1 inode_ctx->cache_time remains unchanged
cache = "f1", "f2"
Child_DOWN / all /some of the bricks down at T2
thus anything cached before T2 is considered invalid but is not cleared
from memory.
At T3, lookup dir1/f1
dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
lookup is sent to the brick.
At T4, lookup dir1/f3
dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
lookup is sent to the brick.
Thus the cache of dir1 is invalid until T1+10 min(cache timeout), and also for
the 10 min no new cache is added to the dir1.
Consequence:
After any brick down/up, the cache of all the existing directories is invalid
and cannot be corrected for max 10 min. Thus after brick down/up the nl-cache
doesn't work for the directories that already had the cache, for the max of 10
min.
Solution:
If the inoce_ctx->cachetime < brick_down time, then delete the existing cache
and start populating fresh cache.
--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15
05:48:25 EDT ---
This bug is automatically being provided 'pm_ack+' for the release flag
'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active
development, having been appropriately marked for the release, and having been
provided ACK from Development and QE
--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15
06:03:05 EDT ---
Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster
Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal
Whiteboard entry of '3.3.0', the Target Release is being automatically set to
'RHGS 3.3.0'
--- Additional comment from Worker Ant on 2017-06-04 01:36:22 EDT ---
REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#1) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Worker Ant on 2017-06-08 01:24:49 EDT ---
REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#2) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Worker Ant on 2017-06-09 01:33:27 EDT ---
REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#3) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Worker Ant on 2017-06-12 01:03:12 EDT ---
REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#4) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Worker Ant on 2017-06-12 01:14:44 EDT ---
REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and
stale cache) posted (#5) for review on master by Poornima G
(pgurusid at redhat.com)
--- Additional comment from Worker Ant on 2017-06-13 01:01:21 EDT ---
COMMIT: https://review.gluster.org/17453 committed in master by Atin Mukherjee
(amukherj at redhat.com)
------
commit 7674584fa53944a4e982e217798f31a3d1ef313b
Author: Poornima G <pgurusid at redhat.com>
Date: Fri May 26 15:45:57 2017 +0530
nl-cache: Fix a possible crash and stale cache
Issue1:
Consider the followinf sequence of operations:
...
nlc_ctx = nlc_ctx_get (inode i1)
....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate
or any other callers
...
GF_FREE (ii nlc_ctx)
LOCK (nlc_ctx->lock); -> This will result in crash as the ctx
got freed in nlc_clear_cache.
Issue2:
lookup on dir1/file1 result in ENOENT
add cache to dir1 at time T1
....
CHILD_DOWN at T2
lookup on dir1/file2 result in ENOENT
add cache to dir1, but the cache time is still T1
lookup on dir1/file2 - should have been served from cache
but the cache time is T1 < T2, hence
cache is considered as invalid.
So, after CHILD_DOWN the right thing would be to clear the cache
and restart caching on that inode.
Solution:
Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget()
The fix for both issue1 and 2 is interleaved hence sending it as
single patch.
Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397
BUG: 1458539
Signed-off-by: Poornima G <pgurusid at redhat.com>
Reviewed-on: https://review.gluster.org/17453
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1450080
[Bug 1450080] [Negative Lookup]: negative lookup features doesn't seem to
work on restart of volume
https://bugzilla.redhat.com/show_bug.cgi?id=1458539
[Bug 1458539] [Negative Lookup]: negative lookup features doesn't seem to
work on restart of volume
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list