[Bugs] [Bug 1469971] New: cluster/dht: Fix hardlink migration failures
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jul 12 07:47:09 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1469971
Bug ID: 1469971
Summary: cluster/dht: Fix hardlink migration failures
Product: Red Hat Gluster Storage
Version: 3.3
Component: distribute
Assignee: nbalacha at redhat.com
Reporter: spalai at redhat.com
QA Contact: tdesala at redhat.com
CC: bugs at gluster.org, rhs-bugs at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1469964
+++ This bug was initially created as a clone of Bug #1469964 +++
Description of problem:
There are few races in remove-brick hardlink migration code path detailed
below.
A brief about how hardlink migration works:
- Different hardlinks (to the same file) may hash to different bricks,
but their cached subvol will be same. Rebalance picks up the first
hardlink,
calculates it's hash(call it TARGET) and set the hashed subvolume as an
xattr on the data file.
- Now all the hardlinks those come after this will fetch that xattr and
will
create linkto files on TARGET (all linkto files for the hardlinks will be
hardlink to each other on TARGET).
- When number of hardlinks on source is equal to the number of hardlinks on
TARGET, the data migration will happen.
RACE:1
Since rebalance is multi-threaded, the first lookup (which decides where
the TARGET subvol should be), can be called by two hardlink migration
parallely and they may end up creating linkto files on two different
TARGET subvols. Hence, hardlinks won't be migrated.
RACE:2
The linkto files on TARGET can be created by other clients also if they
are doing lookup on the hardlinks. Consider a scenario where you have
100
hardlinks. When rebalance is migrating 99th hardlink, as a result of
continuous lookups from other client, linkcount on TARGET is equal to
source linkcount. Rebalance will migrate data on the 99th hardlink
itself.
On 100th hardlink migration, hardlink will have TARGET as cached
subvolume. If it's hash is also the same, then a migration will be
triggered from TARGET to TARGET leading to data loss.
This is reproducible intermittently. Since this is related to hardlink
migration, this happens only with remove-brick process.
--- Additional comment from Worker Ant on 2017-07-12 12:44:13 MVT ---
REVIEW: https://review.gluster.org/17755 (cluster/rebalance: Fix hardlink
migration failures) posted (#1) for review on master by Susant Palai
(spalai at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1469964
[Bug 1469964] cluster/dht: Fix hardlink migration failures
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=BwGq8VvCTk&a=cc_unsubscribe
More information about the Bugs
mailing list