[Bugs] [Bug 1394635] New: errors appear in brick and nfs logs and getting stale files on NFS clients
bugzilla at redhat.com
bugzilla at redhat.com
Mon Nov 14 06:01:50 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1394635
Bug ID: 1394635
Summary: errors appear in brick and nfs logs and getting stale
files on NFS clients
Product: GlusterFS
Version: 3.8
Component: nfs
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
CC: bugs at gluster.org, hklein at redhat.com,
jbuchta at redhat.com, jthottan at redhat.com,
kkeithle at redhat.com, mmalhotr at redhat.com,
ndevos at redhat.com, olim at redhat.com,
pkarampu at redhat.com, rgowdapp at redhat.com,
rhs-bugs at redhat.com, sashinde at redhat.com,
skoduri at redhat.com, storage-qa-internal at redhat.com,
tpetr at redhat.com
Depends On: 1379720
Blocks: 1358096, 1394634
+++ This bug was initially created as a clone of Bug #1379720 +++
Description of problem:
Rotated logs are not accessible from replicated gluster node via NFS mounts.
We've gluster with distributed replica between two sites. After shared log is
rotated by a server on site A many (not all!) nodes accessing the replicated
log from a node at the B site get "cannot access .log: Stale file handle"
error.
RCA from Soumya and Pranith,
The theory which we have come up with is that -->
The reason NFS server is not doing fresh lookup after it received ESTALE could
be that cs->lookuptype must have been set to GF_NFS3_FRESH instead of
GF_NFS3_REVALIDATE.
nfs_lookup() fop starts with setting lookuptype to GF_NFS3_REVALIDATE. But as
per the code changes done in the patch mentioned in the comment#52, before
doing STACK_WIND on the child xlator lookup, in case if we find cached inode
for that file/entry name in the inode table but with inode_ctx not set, we
reset lookuptype to GF_NFS3_FRESH. This may have led to nfs xlator not sending
fresh lookup on receiving ESTALE.
So there could be various reasons for inode_ctx not being set. Underlying
xlators could have done inode_link which we are ruling out for now as we do not
see inode_link done with a file entry name (except for tiered volumes). Other
possibility is that we do not set inode_ctx in readdirp_cbk path. Maybe
* client1 has done readdirp on the directory 'jmsdomain6'..got the file handle
of the file 'gcjrockit_jms06admin00a.log' as part of readdirp response.
That means we have inode entry of this file but with no inode_ctx set.
* meanwhile client2 has deleted and re-created this log file (probably as part
of logrotate)
* Now client1 does lookup on the earlier filehandle it received resulting in
ESTALE.
--- Additional comment from Niels de Vos on 2016-09-29 09:06:45 EDT ---
Patch posted: http://review.gluster.org/15580
--- Additional comment from Worker Ant on 2016-10-12 03:57:17 EDT ---
REVIEW: http://review.gluster.org/15580 (nfs: revalidate lookup converted to
fresh lookup) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Worker Ant on 2016-10-12 06:54:36 EDT ---
REVIEW: http://review.gluster.org/15580 (nfs: revalidate lookup converted to
fresh lookup) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Worker Ant on 2016-10-12 06:56:47 EDT ---
REVIEW: http://review.gluster.org/15580 (nfs: revalidate lookup converted to
fresh lookup) posted (#4) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Worker Ant on 2016-10-13 08:02:54 EDT ---
REVIEW: http://review.gluster.org/15580 (nfs: revalidate lookup converted to
fresh lookup) posted (#5) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Worker Ant on 2016-11-10 17:19:13 EST ---
COMMIT: http://review.gluster.org/15580 committed in master by Kaleb KEITHLEY
(kkeithle at redhat.com)
------
commit ba7a737b1260bbafe22097bea08814035c8b655d
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date: Tue Sep 27 19:01:48 2016 +0530
nfs: revalidate lookup converted to fresh lookup
when an inode ctx is missing for a linked inode the revalidate
lookups are converted to fresh.
This could result in sending ESTALE when the gfid are recreated
We are not able to reproduce the issue with normal setup, most part of
RCA was done with code reading.
Possible scenario in which this bug can reproduce,
Delete a file and recreate a new file with same name, at the same time
from another client process try to list/or access the file.
In this case the second client may throw an ESTALE error for such files
Thanks to Soumya and Pranith for doing the complete RCA
Change-Id: I73992a65844b09a169cefaaedc0dcfb129d66ea1
BUG: 1379720
Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
Reviewed-on: http://review.gluster.org/15580
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: soumya k <skoduri at redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1379720
[Bug 1379720] errors appear in brick and nfs logs and getting stale files
on NFS clients
https://bugzilla.redhat.com/show_bug.cgi?id=1394634
[Bug 1394634] errors appear in brick and nfs logs and getting stale files
on NFS clients
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list