[Bugs] [Bug 1450728] New: Brick Multiplexing: seeing Input/ Output Error for .trashcan

bugzilla at redhat.com bugzilla at redhat.com
Mon May 15 04:31:28 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1450728

            Bug ID: 1450728
           Summary: Brick Multiplexing: seeing Input/Output Error for
                    .trashcan
           Product: GlusterFS
           Version: 3.10
         Component: core
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: amukherj at redhat.com, anoopcs at redhat.com,
                    bugs at gluster.org, jthottan at redhat.com,
                    nchilaka at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, rkavunga at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1447389
            Blocks: 1443941



+++ This bug was initially created as a clone of Bug #1447389 +++

+++ This bug was initially created as a clone of Bug #1443941 +++

Description of problem:
=====================
Observation: I had enabled brick multiplexing 
I am seeing EIO for .trashcan folder on the mount for an ecvol as below
# ls -la
total 12
drwxr-xr-x.  4 root root 4096 Apr 20 15:22 .
drwxr-xr-x. 15 root root 4096 Apr 20 15:17 ..
drwxr-xr-x.  2 root root 4096 Apr 20 15:22 dir1
# ls -lA
ls: cannot access .trashcan: Input/output error
total 4
drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1
d?????????? ? ?    ?       ?            ? .trashcan




Steps
=====
Step1:
I had 6 node setup on which i created below volumes

Step2:
enable brick multiplexing

Step3:
ecreated below vols
ecv82-->an ec volume of 2x(8+2) spanning across nodes n1..n5 
distrep3-->a distrep x3 volume 2x3 spanning across n1..n3

now as expected the brick PIDs for bricks hosted by one node is same due to
brick mux enabled (check under logs)

Step4:
I then went ahead and changed the log level to debug for distrep3 (which should
use same brick log as ecv82)

Step5:
I then mounted distrep3 on  a fuse client
=======>NOTE: I am not seeing .trashcan folder on the mount, don't know why

Step6:
Did some IOs

Step7:
Set min-free disk limit to 50% for distrep3

step8:
did IOs to see if i am getting a warn for breaching minfree and got the below
on client fuse log

[2017-04-20 09:45:58.717617] W [MSGID: 109033]
[dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on
subvolume 'distrep3-replicate-1' is getting full (55.00 %), consider adding
more bricks
[2017-04-20 09:46:50.749409] W [MSGID: 109033]
[dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on
subvolume 'distrep3-replicate-2' is getting full (54.00 %), consider adding
more bricks


Step9:
Now mounted ecv82 on a fuse client

Step10:
did an ls -lA and got the EIO


[root at dhcp35-103 ecv82]# ls -lA
ls: cannot access .trashcan: Input/output error
total 4
drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1
d?????????? ? ?    ?       ?            ? .trashcan
[root at dhcp35-103 ecv82]# 






#############logs ##############

Task Status of Volume ecv82
------------------------------------------------------------------------------
There are no active volume tasks
 # gluster v info

Volume Name: distrep3
Type: Distributed-Replicate
Volume ID: 28a6c08e-b7a0-4135-88fa-4b9ae250d609
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.70.35.138:/rhs/brick11/distrep3
Brick2: 10.70.35.130:/rhs/brick11/distrep3
Brick3: 10.70.35.122:/rhs/brick11/distrep3
Brick4: 10.70.35.138:/rhs/brick12/distrep3
Brick5: 10.70.35.130:/rhs/brick12/distrep3
Brick6: 10.70.35.122:/rhs/brick12/distrep3
Brick7: 10.70.35.138:/rhs/brick13/distrep3
Brick8: 10.70.35.130:/rhs/brick13/distrep3
Brick9: 10.70.35.122:/rhs/brick13/distrep3
Options Reconfigured:
cluster.min-free-disk: 50
cluster.quorum-count: 1
diagnostics.brick-log-level: DEBUG
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable

Volume Name: ecv82
Type: Distributed-Disperse
Volume ID: c2a84a0f-a95f-4264-984b-2e0879da7f99
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (8 + 2) = 20
Transport-type: tcp
Bricks:
Brick1: 10.70.35.138:/rhs/brick1/ecv82
Brick2: 10.70.35.130:/rhs/brick1/ecv82
Brick3: 10.70.35.122:/rhs/brick1/ecv82
Brick4: 10.70.35.23:/rhs/brick1/ecv82
Brick5: 10.70.35.112:/rhs/brick1/ecv82
Brick6: 10.70.35.138:/rhs/brick2/ecv82
Brick7: 10.70.35.130:/rhs/brick2/ecv82
Brick8: 10.70.35.122:/rhs/brick2/ecv82
Brick9: 10.70.35.23:/rhs/brick2/ecv82
Brick10: 10.70.35.112:/rhs/brick2/ecv82
Brick11: 10.70.35.138:/rhs/brick3/ecv82
Brick12: 10.70.35.130:/rhs/brick3/ecv82
Brick13: 10.70.35.122:/rhs/brick3/ecv82
Brick14: 10.70.35.23:/rhs/brick3/ecv82
Brick15: 10.70.35.112:/rhs/brick3/ecv82
Brick16: 10.70.35.138:/rhs/brick4/ecv82
Brick17: 10.70.35.130:/rhs/brick4/ecv82
Brick18: 10.70.35.122:/rhs/brick4/ecv82
Brick19: 10.70.35.23:/rhs/brick4/ecv82
Brick20: 10.70.35.112:/rhs/brick4/ecv82
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root at dhcp35-45 ~]# 




Rationale Of the testing:
I wanted to check the behavior when we have brick mux in effect and we try to
change some brick settings




--- Additional comment from nchilaka on 2017-04-20 06:28:57 EDT ---

fuse mount log:
[2017-04-20 09:48:19.851779] W [fuse-resolve.c:61:fuse_resolve_entry_cbk]
0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve
(Input/output error)
[2017-04-20 09:48:19.854563] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in
/.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:48:19.855996] W [MSGID: 109065]
[dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring
inodelk failed for /.trashcan [Input/output error]
[2017-04-20 09:48:19.856077] W [fuse-bridge.c:471:fuse_entry_cbk]
0-glusterfs-fuse: 23: LOOKUP() /.trashcan => -1 (Input/output error)
[2017-04-20 09:52:35.145993] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in
/.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:52:35.147543] W [MSGID: 109065]
[dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring
inodelk failed for /.trashcan [Input/output error]
[2017-04-20 09:52:35.147583] W [fuse-resolve.c:61:fuse_resolve_entry_cbk]
0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve
(Input/output error)
[2017-04-20 09:52:35.152434] W [fuse-bridge.c:471:fuse_entry_cbk]
0-glusterfs-fuse: 866: LOOKUP() /.trashcan => -1 (Input/output error)
[2017-04-20 09:52:35.150938] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in
/.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:52:35.152401] W [MSGID: 109065]
[dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring
inodelk failed for /.trashcan [Input/output error]

--- Additional comment from Jiffin on 2017-04-27 06:17:27 EDT ---

While trying out this bug, i have found the following. When volume is started
with brick multiplexing is enabled, ".trashcan" was created only on three
bricks out of 20 bricks(2x(8+2)) , not on all the subvolume. I have gut feeling
that this might cause for this bug and bz1443939.

P.S : I don't have enough knowledge about brick multiplexing to command why it
is happening and the test was performed in my workstation.

Also if possible I request QA to retest above scenario using following steps
1.) create the volume
2.) start the volume
3.) enable brick-mulitplexing
4.) restart the volume(stop and start)
5.) Then retest the case

--- Additional comment from Worker Ant on 2017-05-09 11:47:40 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-11 01:11:47 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-11 07:55:23 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#3) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-11 14:10:52 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#4) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-12 04:36:01 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#5) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-12 11:56:39 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#6) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-14 09:29:26 EDT ---

REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick
attach) posted (#7) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-05-14 17:10:34 EDT ---

COMMIT: https://review.gluster.org/17225 committed in master by Jeff Darcy
(jeff at pl.atyp.us) 
------
commit 86ad032949cb80b6ba3df9dc8268243529d4eb84
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Tue May 9 21:05:50 2017 +0530

    glusterfsd: send PARENT_UP on brick attach

    With brick multiplexing being enabled, if a brick is instance attached to a
    process then a PARENT_UP event is needed so that it reaches right till
    posix layer and then from posix CHILD_UP event is sent back to all the
    children.

    Change-Id: Ic341086adb3bbbde0342af518e1b273dd2f669b9
    BUG: 1447389
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: https://review.gluster.org/17225
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Jeff Darcy <jeff at pl.atyp.us>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1443941
[Bug 1443941] Brick Multiplexing: seeing Input/Output Error for .trashcan
https://bugzilla.redhat.com/show_bug.cgi?id=1447389
[Bug 1447389] Brick Multiplexing: seeing Input/Output Error for .trashcan
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list