[Bugs] [Bug 1292046] New: Renames/deletes failed with "No such file or directory" when few of the bricks from the hot tier went offline
bugzilla at redhat.com
bugzilla at redhat.com
Wed Dec 16 10:36:25 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1292046
Bug ID: 1292046
Summary: Renames/deletes failed with "No such file or
directory" when few of the bricks from the hot tier
went offline
Product: GlusterFS
Version: 3.7.6
Component: replicate
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: kdhananj at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com,
kdhananj at redhat.com, spandura at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1291560, 1291701
+++ This bug was initially created as a clone of Bug #1291701 +++
+++ This bug was initially created as a clone of Bug #1291560 +++
Description of problem:
===========================
On a tiered volume, with 2x2 cold tier and 2x3 dis-rep hot tier was performing
renames|deletes on files/dirs. When the bricks went offline, the
renames/deletes failed with "No such file or directory". The bricks from cold
tier were all online as quorum was set. Only one of the bricks from the each
sub-volume of the hot-tier went offline.
Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.7.5 built on Dec 3 2015 11:30:45
How reproducible:
====================
Often
Steps to Reproduce:
======================
1. Create 2x2 dis-rep cold-tier and 2x3 dis-rep hot-tier volume. Start the
volume. Mount the volume.
2. From mount, create files/dirs.
3. rename few of the files/dirs created to different name
4. While rename is in progress, crash the mounts of the bricks using "godown"
utility.(available in xfsprogs).
Actual results:
================
On mount, the rename fails with "No such file or directory"
Expected results:
====================
Renames/deletes shouldn't fail
Additional info:
===================
Volume Name: testvol
Type: Tier
Volume ID: 5a2f042d-ee04-4b3d-b5d5-d36e29cea325
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick1: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier5
Brick2: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier4
Brick3: rhsauto038.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier3
Brick4: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2
Brick5: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier1
Brick6: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick7: rhsauto017.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick8: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick9: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick10: rhsauto038.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.watermark-low: 75
cluster.watermark-hi: 90
Error messages seen in client log:
=================================
[2015-12-14 07:45:12.156546] E [MSGID: 114031]
[client-rpc-fops.c:251:client3_3_mknod_cbk] 0-testvol-client-9: remote
operation failed. Path: (null) [Input/output error]
[2015-12-14 07:45:12.159452] W [MSGID: 114031]
[client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-0: remote
operation failed [Device or resource busy]
[2015-12-14 07:45:12.159480] W [MSGID: 114031]
[client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-1: remote
operation failed [Device or resource busy]
[2015-12-14 07:45:12.160293] I [MSGID: 109069]
[dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 0-testvol-tier-dht:
Returned with op_ret -1 and op_errno 16 for /E_file_44
--- Additional comment from Vijay Bellur on 2015-12-15 08:28:17 EST ---
REVIEW: http://review.gluster.org/12973 (cluster/afr: During name heal,
propagate EIO only on gfid or type mismatch) posted (#1) for review on master
by Krutika Dhananjay (kdhananj at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-16 05:33:42 EST ---
COMMIT: http://review.gluster.org/12973 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 7ba6469eee3118cc4ece905d2538ef778320ae63
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date: Tue Dec 15 18:48:20 2015 +0530
cluster/afr: During name heal, propagate EIO only on gfid or type mismatch
When the disk associated with a brick returns EIO during lookup, chances
are
that name heal would return an EIO because one of the syncop_XXX()
operations
as part of it returned an EIO. This is inherently treated by
afr_lookup_selfheal_wrap()
as a split-brain and the lookup is aborted prematurely with EIO even if it
succeeded on the other replica(s).
Change-Id: Ib9b7f2974bff8e206897bb4f689f0482264c61e5
BUG: 1291701
Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
Reviewed-on: http://review.gluster.org/12973
Tested-by: NetBSD Build System <jenkins at build.gluster.org>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1291560
[Bug 1291560] Renames/deletes failed with "No such file or directory" when
few of the bricks from the hot tier went offline
https://bugzilla.redhat.com/show_bug.cgi?id=1291701
[Bug 1291701] Renames/deletes failed with "No such file or directory" when
few of the bricks from the hot tier went offline
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list