[Bugs] [Bug 1342375] New: [quota+snapshot]: Directories are inaccessible from activated snapshot, when the snapshot was created during directory creation

bugzilla at redhat.com bugzilla at redhat.com
Fri Jun 3 06:12:27 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1342375

            Bug ID: 1342375
           Summary: [quota+snapshot]: Directories are inaccessible from
                    activated snapshot, when the snapshot was created
                    during directory creation
           Product: GlusterFS
           Version: 3.6.10
         Component: snapshot
          Keywords: Regression, ZStream
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: rkavunga at redhat.com
                CC: bugs at gluster.org, kramdoss at redhat.com,
                    nbalacha at redhat.com, rcyriac at redhat.com,
                    rhinduja at redhat.com, rjoseph at redhat.com,
                    rkavunga at redhat.com
        Depends On: 1341034, 1341796, 1342372, 1342374
            Blocks: 1311817



+++ This bug was initially created as a clone of Bug #1342374 +++

+++ This bug was initially created as a clone of Bug #1342372 +++

+++ This bug was initially created as a clone of Bug #1341796 +++

+++ This bug was initially created as a clone of Bug #1341034 +++

Description of problem:
>From the snapshot taken during directory creation, the directories which were
being created aren't accessible.
snapshots taken later without any IO ops seems to  have consistent data.

Volume Name: superman
Type: Tier
Volume ID: ba49611f-1cbc-4a25-a1a8-8a0eecfe6f76
Status: Started
Number of Bricks: 20
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: 10.70.35.133:/bricks/brick7/reg-tier-3
Brick2: 10.70.35.10:/bricks/brick7/reg-tier-3
Brick3: 10.70.35.11:/bricks/brick7/reg-tier-3
Brick4: 10.70.35.225:/bricks/brick7/reg-tier-3
Brick5: 10.70.35.239:/bricks/brick7/reg-tier-3
Brick6: 10.70.37.60:/bricks/brick7/reg-tier-3
Brick7: 10.70.37.120:/bricks/brick7/reg-tier-3
Brick8: 10.70.37.101:/bricks/brick7/reg-tier-3
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick9: 10.70.37.101:/bricks/brick0/l1
Brick10: 10.70.37.120:/bricks/brick0/l1
Brick11: 10.70.37.60:/bricks/brick0/l1
Brick12: 10.70.35.239:/bricks/brick0/l1
Brick13: 10.70.35.225:/bricks/brick0/l1
Brick14: 10.70.35.11:/bricks/brick0/l1
Brick15: 10.70.35.10:/bricks/brick0/l1
Brick16: 10.70.35.133:/bricks/brick0/l1
Brick17: 10.70.37.101:/bricks/brick1/l1
Brick18: 10.70.37.120:/bricks/brick1/l1
Brick19: 10.70.37.60:/bricks/brick1/l1
Brick20: 10.70.35.239:/bricks/brick1/l1
Options Reconfigured:
features.barrier: disable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: disable

'ls -l' from mountpoint where snapshot is activated

???????????   ? ?    ?           ?            ? dir-1
???????????   ? ?    ?           ?            ? dir-10
???????????   ? ?    ?           ?            ? dir-11
???????????   ? ?    ?           ?            ? dir-12
???????????   ? ?    ?           ?            ? dir-13
???????????   ? ?    ?           ?            ? dir-14
???????????   ? ?    ?           ?            ? dir-15
???????????   ? ?    ?           ?            ? dir-16
???????????   ? ?    ?           ?            ? dir-17

gluster  snapshot list
snapshot-superman-1_GMT-2016.05.31-04.54.11
snapshot-superman-2_GMT-2016.05.31-05.02.13
snapshot-superman-3_GMT-2016.05.31-05.08.25
snapshot-superman-4_GMT-2016.05.31-05.24.10

snapshot 'snapshot-superman-1_GMT-2016.05.31-04.54.11' was taken during
directory creation. Rest of the snapshots were taken later without IOs.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-6.el7rhgs.x86_64

How reproducible:
1/1, yet to determine

Steps to Reproduce:
1. create a disperse volume 2 x (4+2)
2. start linux untar operation, mkdir -p dir-{1..1000}/sd-{1..100} from two
different clients
3. attach a 4x2 hot tier
4. create a snapshot 
5. activate the snapshot and list directories

Actual results:
directories are inaccessible

Expected results:
directories should be accessible

Additional info:
sosreports shall be attached shortly.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-31
02:55:40 EDT ---

This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from krishnaram Karthick on 2016-05-31 04:44:36 EDT ---

 - Tried reproducing this issue, couldn't reproduce
 - This is possible when fixlayout in hot tier is not complete and we try to
take a snapshot, will have to confirm this theory

--- Additional comment from krishnaram Karthick on 2016-05-31 04:45:28 EDT ---



--- Additional comment from krishnaram Karthick on 2016-06-01 01:58:14 EDT ---

snapshot-1 was activated and mounted on 10.70.47.161 on '/mnt/superman'

[root at dhcp47-161 ~]# mount
...
10.70.37.120:/snaps/snapshot-superman-1_GMT-2016.05.31-04.54.11/superman on
/mnt/superman type fuse.glusterfs
(ro,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
...

--- Additional comment from krishnaram Karthick on 2016-06-01 03:15:19 EDT ---

Although this issue is not consistently seen, This issue was not seen in 3.1.2
release.

Proposing this bug as a blocker, to discuss and decide whether to take it in
3.1.3.

--- Additional comment from Nithya Balachandran on 2016-06-01 04:34:04 EDT ---

The logs indicate that the dht selfheal and hence the lookup fails for the
directories that are not accessible.



>From mnt-superman.log on 10.70.47.161:

[2016-06-01 05:21:39.776823] I [MSGID: 109063]
[dht-layout.c:718:dht_layout_normalize]
0-87fda0f4b2404018904c3d49718497c5-tier-dht: Found anomalies in /dir-9 (gfid =
94224520-61a4-4d26-a2fa-152f2631a295). Holes=1 overlaps=0
[2016-06-01 05:21:39.783140] E [MSGID: 114031]
[client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-17: remote
operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.783278] E [MSGID: 114031]
[client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-16: remote
operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.784359] W [MSGID: 109005]
[dht-selfheal.c:1172:dht_selfheal_dir_mkdir_cbk]
0-87fda0f4b2404018904c3d49718497c5-tier-dht: Directory selfheal failed: path =
/dir-9, gfid = 94224520-61a4-4d26-a2fa-152f2631a295 [Invalid argument]
[2016-06-01 05:21:39.790632] W [fuse-resolve.c:66:fuse_resolve_entry_cbk]
0-fuse: 00000000-0000-0000-0000-000000000001/dir-9: failed to resolve (Invalid
argument)



>From the snapshot brick
(/run/gluster/snaps/87fda0f4b2404018904c3d49718497c5/brick3/reg-tier-3)  logs
for 0-superman-client-17:

[2016-06-01 05:19:24.493623] W [MSGID: 120022]
[quota-enforcer-client.c:236:quota_enforcer_lookup_cbk]
0-87fda0f4b2404018904c3d49718497c5-quota: Getting cluster-wide size of
directory failed (path: / gfid:00000000-0000-0000-0000-000000000001) [Invalid
argument]
[2016-06-01 05:19:24.493696] E [MSGID: 115056]
[server-rpc-fops.c:515:server_mkdir_cbk]
0-87fda0f4b2404018904c3d49718497c5-server: 5516: MKDIR /dir-9
(00000000-0000-0000-0000-000000000001/dir-9) client:
dhcp47-161.lab.eng.blr.redhat.com-1099-2016/05/31-05:21:53:395888-superman-client-17-0-0
[Invalid argument]



On examining the quotad process using gdb, the operation fails in
quotad_aggregator_lookup () -> qd_nameless_lookup ().


qd_nameless_lookup () {

...
        subvol = qd_find_subvol (this, volume_uuid);
        if (subvol == NULL) {
                op_errno = EINVAL;     <------ fails here
                goto out;
        }


This is because snapshot volumes are not part of the quotad graph.


This is unrelated to tiering. Modifying the description accordingly.

--- Additional comment from Rejy M Cyriac on 2016-06-01 07:41:59 EDT ---

Accepted as Blocker for RHGS 3.1.3 release at the Blocker Bug Triage meeting on
01 June 2016

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-06-01
09:58:07 EDT ---

Since this bug has been approved for the z-stream release of Red Hat Gluster
Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1
Update 3 release through the Internal Whiteboard entry of '3.1.3', the Target
Release is being automatically set to 'RHGS 3.1.3'

--- Additional comment from Vijay Bellur on 2016-06-01 15:00:02 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota
related options from snap volfile) posted (#1) for review on master by mohammed
rafi  kc (rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-06-02 03:16:08 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota
related options from snap volfile) posted (#2) for review on master by mohammed
rafi  kc (rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-06-02 07:14:57 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota
related options from snap volfile) posted (#3) for review on master by mohammed
rafi  kc (rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-06-02 07:54:07 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota
related options from snap volfile) posted (#4) for review on master by mohammed
rafi  kc (rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-06-03 02:09:03 EDT ---

COMMIT: http://review.gluster.org/14608 committed in master by Rajesh Joseph
(rjoseph at redhat.com) 
------
commit 03d523504230c336cf585159266e147945f31153
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date:   Wed Jun 1 23:01:37 2016 +0530

    glusterd/snapshot: remove quota related options from snap volfile

    enabling inode-quota on a snapshot volume is unnecessary, because
    snapshot is a read-only volume. So we don't need to enforce quota
    on a snapshot volume.

    This patch will remove the quota related options from snapshot
    volfile.

    Change-Id: Iddabcb83820dac2384924a01d45abe1ef1e95600
    BUG: 1341796
    Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
    Reviewed-on: http://review.gluster.org/14608
    Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-by: N Balachandran <nbalacha at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Rajesh Joseph <rjoseph at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1341034
[Bug 1341034] [quota+snapshot]: Directories are inaccessible from activated
snapshot, when the snapshot was created during directory creation
https://bugzilla.redhat.com/show_bug.cgi?id=1341796
[Bug 1341796] [quota+snapshot]: Directories are inaccessible from activated
snapshot, when the snapshot was created during directory creation
https://bugzilla.redhat.com/show_bug.cgi?id=1342372
[Bug 1342372] [quota+snapshot]: Directories are inaccessible from activated
snapshot, when the snapshot was created during directory creation
https://bugzilla.redhat.com/show_bug.cgi?id=1342374
[Bug 1342374] [quota+snapshot]: Directories are inaccessible from activated
snapshot, when the snapshot was created during directory creation
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list