[Bugs] [Bug 1191537] New: With afrv2 + ext4, lookups on directories with large offsets could result in duplicate/missing entries
bugzilla at redhat.com
bugzilla at redhat.com
Wed Feb 11 13:23:28 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1191537
Bug ID: 1191537
Summary: With afrv2 + ext4, lookups on directories with large
offsets could result in duplicate/missing entries
Product: GlusterFS
Version: 3.6.2
Component: core
Keywords: Triaged
Severity: high
Priority: high
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com,
hchiramm at redhat.com, jbyers at stonefly.com,
jhoffman at afrl.hpc.mil, pkarampu at redhat.com,
skoduri at redhat.com
Depends On: 1163161
+++ This bug was initially created as a clone of Bug #1163161 +++
Description of problem:
'ext4' uses large offsets which may include the bits used by GlusterFS to
encode the brick-id. This could end up in few offsets being modified when given
back to the filesystem resulting in missing files and other such discrepancies.
Avati has proposed a solution to overcome this issue based on the assumption
that "both EXT4/XFS are tolerant in terms of the accuracy of the value
presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry
which has the "closest" true offset. For more info, please check
http://review.gluster.org/#/c/4711/.
But now with afr using the same itransform/deitransform logic, the brick-id
stored in the afr_global_d_off gets zeroed out when re-encoded in dht. This
happens only when the offsets are huge (i.e with ext4 filesystem) as in such
cases, the low n bits are replaced with brick-id which in turn gets replaced
with afr_subvol_id when re-encoded in dht, where
n = log2(N)
N = no. of DHT/AFR subvolumes.
--- Additional comment from Anand Avati on 2014-11-12 07:55:16 EST ---
REVIEW: http://review.gluster.org/8201 (dht/afr: Modify itransform/deitransform
to prevent loss of brick-id incase of both dht & afr involved.) posted (#4) for
review on master by soumya k (skoduri at redhat.com)
--- Additional comment from Anand Avati on 2014-12-23 13:23:55 EST ---
REVIEW: http://review.gluster.org/9332 (afr: stop encoding subvolume id in
readdir d_off) posted (#1) for review on master by Anand Avati
(avati at redhat.com)
--- Additional comment from Anand Avati on 2014-12-26 04:59:57 EST ---
REVIEW: http://review.gluster.org/9332 (afr: stop encoding subvolume id in
readdir d_off) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-26 09:21:49 EST ---
COMMIT: http://review.gluster.org/9332 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 7926fe6f7df664bbe5e050a8e66240dd67155eec
Author: Anand Avati <avati at redhat.com>
Date: Tue Dec 23 10:04:00 2014 -0800
afr: stop encoding subvolume id in readdir d_off
The purpose of encoding d_off in AFR is to indicate the
selected subvolume for the first readdir, and continue all
further readdirs of the session on the same subvolume. This is
required because, unlike files, dir d_offs are specific to the
backend and cannot be re-used on another subvolume. The d_off
transformation encodes the subvolume id and prevents such
invalid use of d_offs on other servers.
However, this approach could be quite wasteful of precious d_off
bit-space. Unlike DHT, where server id can change from entry to
entry and thus encoding the server id in the transformed d_off
is necessary, we could take a slightly relaxed approach in AFR.
The approach is to save the subvolume where the last readdir
request was sent in the fd_ctx. This consumes constant space (i.e
no per-entry cache), and serves the purpose of avoiding d_off
"misuse" (i.e using d_off from one server on another).
The compromise here is NFS resuming readdir from a non-0 cookie
after an extended delay (either anonymous FD has been reclaimed,
or server has restarted). In such cases a subvolume is picked
freshly. To make this fresh picking more deterministic (i.e, to
pick the same subvolume whenever possible, even after reboots),
the function afr_hash_child (used by afr_read_subvol_select_by_policy)
is modified to skip all dynamic inputs (i.e PID) for the case
of directories.
Change-Id: I46ad95feaeb21fb811b7e8d772866a646330c9d8
BUG: 1163161
Signed-off-by: Anand Avati <avati at redhat.com>
Reviewed-on: http://review.gluster.org/9332
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
--- Additional comment from Anand Avati on 2015-02-11 07:58:03 EST ---
REVIEW: http://review.gluster.org/9638 (afr: stop encoding subvolume id in
readdir d_off) posted (#1) for review on release-3.6 by Pranith Kumar Karampuri
(pkarampu at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1163161
[Bug 1163161] With afrv2 + ext4, lookups on directories with large offsets
could result in duplicate/missing entries
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list