[Bugs] [Bug 1219547] New: I/O failure on attaching tier
bugzilla at redhat.com
bugzilla at redhat.com
Thu May 7 14:52:50 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1219547
Bug ID: 1219547
Summary: I/O failure on attaching tier
Product: GlusterFS
Version: 3.7.0
Component: tiering
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: dlambrig at redhat.com
QA Contact: bugs at gluster.org
CC: annair at redhat.com, bugs at gluster.org,
dlambrig at redhat.com, josferna at redhat.com,
nchilaka at redhat.com
Depends On: 1214289
Blocks: 1186580 (qe_tracker_everglades), 1199352
(glusterfs-3.7.0)
+++ This bug was initially created as a clone of Bug #1214289 +++
Description of problem:
I/O failure on attaching tier
Version-Release number of selected component (if applicable):
glusterfs-server-3.7dev-0.994.git0d36d4f.el6.x86_64
How reproducible:
Steps to Reproduce:
1. Create a replica volume
2. Start 100% writes I/O on the volum
3. Attach a a tier while the I/O is in progress
4. Attach tier is successful, but I/O fails
Actual results:
See that the I/O's are failing. Here is the console o/p:
linux-2.6.31.1/arch/ia64/include/asm/sn/mspec.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/mspec.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/nodepda.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/nodepda.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcibr_provider.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcibr_provider.h: Cannot open:
Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcibus_provider_defs.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcibus_provider_defs.h: Cannot
open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcidev.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcidev.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pda.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pda.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pic.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pic.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/rw_mmr.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/rw_mmr.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/shub_mmr.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/shub_mmr.h: Cannot open: Stale
file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/shubio.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/shubio.h: Cannot open: Stale file
handle
linux-2.6.31.1/arch/ia64/include/asm/sn/simulator.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/simulator.h: Cannot open: Stale
file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/sn2/
Expected results:
I/O should continue normally while the tier is being added. Additionally, all
the new writes post the tier addition should go to the hot tier.
Additional info:
--- Additional comment from Anoop on 2015-04-22 07:05:58 EDT ---
Volume info before attach:
Volume Name: vol1
Type: Replicate
Volume ID: b77d4050-7fdc-45ff-a084-f85eec2470fc
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.56:/rhs/brick1
Brick2: 10.70.35.67:/rhs/brick1
Volume Info post attach
Volume Name: vol1
Type: Tier
Volume ID: b77d4050-7fdc-45ff-a084-f85eec2470fc
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.67:/rhs/brick2
Brick2: 10.70.35.56:/rhs/brick2
Brick3: 10.70.35.56:/rhs/brick1
Brick4: 10.70.35.67:/rhs/brick1
--- Additional comment from Dan Lambright on 2015-04-22 15:46:08 EDT ---
When we attach a tier, the new added translator has no cached sub volume for
IOs in flight. So IOs to open files fail. Solution is to recompute the cached
sub volume for all open FDs with a lookup in tier_init, I believe, working on a
fix.
--- Additional comment from Anand Avati on 2015-04-28 16:28:27 EDT ---
REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until
subvolumes ready (WIP)) posted (#1) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-04-29 16:22:55 EDT ---
REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until
subvolumes ready (WIP)) posted (#2) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-04-29 18:05:44 EDT ---
REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until
subvolumes ready (WIP)) posted (#3) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-05-04 14:55:52 EDT ---
REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until
subvolumes ready) posted (#4) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Dan Lambright on 2015-05-04 14:57:34 EDT ---
There may still be a window where an I/O error can happen, but this fix should
close most of them. The window will be able to be completely close after BZ
1156637 is resolved.
--- Additional comment from Anand Avati on 2015-05-05 11:36:32 EDT ---
COMMIT: http://review.gluster.org/10435 committed in master by Kaleb KEITHLEY
(kkeithle at redhat.com)
------
commit 377505a101eede8943f5a345e11a6901c4f8f420
Author: Dan Lambright <dlambrig at redhat.com>
Date: Tue Apr 28 16:26:33 2015 -0400
cluster/tier: don't use hot tier until subvolumes ready
When we attach a tier, the hot tier becomes the hashed
subvolume. But directories may not yet have been replicated by
the fix layout process. Hence lookups to those directories
will fail on the hot subvolume. We should only go to the hashed
subvolume once the layout has been fixed. This is known if the
layout for the parent directory does not have an error. If
there is an error, the cold tier is considered the hashed
subvolume. The exception to this rules is ENOCON, in which
case we do not know where the file is and must abort.
Note we may revalidate a lookup for a directory even if the
inode has not yet been populated by FUSE. This case can
happen in tiering (where one tier has completed a lookup
but the other has not, in which case we revalidate one tier
when we call lookup the second time). Such inodes are
still invalid and should not be consulted for validation.
Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
BUG: 1214289
Signed-off-by: Dan Lambright <dlambrig at redhat.com>
Reviewed-on: http://review.gluster.org/10435
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
Reviewed-by: N Balachandran <nbalacha at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1186580
[Bug 1186580] QE tracker bug for Everglades
https://bugzilla.redhat.com/show_bug.cgi?id=1199352
[Bug 1199352] GlusterFS 3.7.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1214289
[Bug 1214289] I/O failure on attaching tier
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list