[Bugs] [Bug 1323042] New: Inconsistent directory structure on dht subvols caused by parent layouts going stale during entry create operations because of fix-layout

bugzilla at redhat.com bugzilla at redhat.com
Fri Apr 1 04:44:34 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1323042

            Bug ID: 1323042
           Summary: Inconsistent directory structure on dht subvols caused
                    by parent layouts going stale during entry create
                    operations because of fix-layout
           Product: Red Hat Gluster Storage
         Component: gluster-dht
          Assignee: rhs-bugs at redhat.com
          Reporter: rgowdapp at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: bugs at gluster.org
        Depends On: 1323040



+++ This bug was initially created as a clone of Bug #1323040 +++

Description of problem:
After rebalance changes the layouts of the directories on-disk, client's
in-memory layout becomes stale. If a lookup was not sent on the directory
(driven by higher layers), dht goes ahead and uses the stale layouts. This has
a serious consequence during entry operations which normally rely on layout to
determine a hashed subvol. Some of the manifestations of this problem we've
seen are:

1. A directory having different gfid on different subvolumes (resulting from
parallel mkdir of same path from different clients with some having up-to-date
layout and some having stale layout).
2. A file with data-file being present on different subvols and having
different gfid (resulting from parallel create of same file from different
clients with some having up-to-date layout and some having stale layout).

Version-Release number of selected component (if applicable):


How reproducible:
Quite consistently

Steps to Reproduce:

Set up a dist-rep volume, maybe 6x2.
1. Create a data set with a large number of directories- fairly deep and
several
dirs at each level
2. Add several bricks.
3. From multiple NFS clients run the same script to create multiple dirs inside
the ones already created. We want different clients to try creating the same
dirs so only one should succeed.
4. While the script is running, start a rebalance.

The issue we want to test is a mkdir issue during rebalance when different
clients have different in memory layouts for the parent dirs.

Actual results:

Same dir has different gfids on different subvols.

To find the issue use following steps, once your test is complete.

1. Have a fuse mount with use-readdirp=no and disable attribute/entry caching.

[root at unused ~]# mount -t glusterfs -o
entry-timeout=0,attribute-timeout=0,use-readdirp=no localhost:/dist
/mnt/glusterfs
[root at unused ~]# ps ax | grep -i readdirp
30801 ?        Ssl    0:00 /usr/local/sbin/glusterfs --use-readdirp=no
--attribute-timeout=0 --entry-timeout=0 --volfile-server=localhost
--volfile-id=/dist /mnt/glusterfs

2. turn off md-cache/stat-prefetch 

[root at unused ~]# gluster volume set dist performance.stat-prefetch off
volume set: success

3. Now do a crawl of the entire glusterfs.
  [root at unused ~]# find /mnt/glusterfs > /dev/null

4. Look for mount log for MGSID: 109009

[root at unused ~]# grep "MSGID: 109009" /var/log/glusterfs/mnt-glusterfs.log 

[2016-03-30 06:00:18.762188] W [MSGID: 109009]
[dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on
dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol =
cafedbd2-823b-4feb-82eb-b0011d71babe
[2016-03-30 06:00:22.596947] W [MSGID: 109009]
[dht-common.c:571ht_lookup_dir_cbk] 0-dist-dht: /dir: gfid different on
dist-client-9. gfid local = cd4adbd2-823b-4feb-82eb-b0011d71cfec, gfid subvol =
cafedbd2-823b-4feb-82eb-b0011d71babe

Expected results:
1. Not more than one among mkdirs issued on same path from multiple clients
should succeed.
2. No directory should have different gfid on different subvols.

Additional info:


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1323040
[Bug 1323040] Inconsistent directory structure on dht subvols caused by
parent layouts going stale during entry create operations because of
fix-layout
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=ImRL46UmWF&a=cc_unsubscribe


More information about the Bugs mailing list