[Bugs] [Bug 1219846] New: Data Tiering: glusterd(management) communication issues seen on tiering setup
bugzilla at redhat.com
bugzilla at redhat.com
Fri May 8 13:11:17 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1219846
Bug ID: 1219846
Summary: Data Tiering: glusterd(management) communication
issues seen on tiering setup
Product: GlusterFS
Version: 3.7.0
Component: tiering
Keywords: Triaged
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
QA Contact: bugs at gluster.org
CC: bugs at gluster.org, dlambrig at redhat.com,
nchilaka at redhat.com, vagarwal at redhat.com
Depends On: 1211264
Blocks: 1186580 (qe_tracker_everglades), 1199352
(glusterfs-3.7.0)
+++ This bug was initially created as a clone of Bug #1211264 +++
Description of problem:
======================
While executing commands like quota on, attach-tier, detach tier etc on a
cluster
with one tiered volume atleast, there are errors observed like updating the
tables on other nodes of clusters.
Some examples are:
1)volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.
2) Sometimes when a command like detach tier or quota disable is issued on a
multi node cluster, the command gets executed only on the local node and fails
to get updated in the tables or graphs of other nodes.
We have seen this issue even on non tiered volume sometimes , but can be seen
after any tiering commands have been executed on that cluster.
There seems to be a issue with management deamon b/w nodes
In more detail, I issued a detach tier command from one node's cli, and
following is the o/p seen from both the nodes resepctive cli:
(local node, where i have been executing all the commands so far)
[root at rhs-client6 glusterd]# gluster v info disperse
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root at yarrow glusterd]# gluster v info disperse
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
It can be clearly seen that the other nodes havent been updated.
Version-Release number of selected component (if applicable):
============================================================
[root at rhs-client6 glusterd]# gluster --version
glusterfs 3.7dev built on Apr 13 2015 07:14:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
[root at rhs-client6 glusterd]# rpm -qa|grep gluster
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64
How reproducible:
================
quite easily
Steps to Reproduce:
==================
1.Install latest nightly
2.create a cluster with atleast two nodes
3.create a tired volume
4. try to enable and then disable quotas and we can see the issue
or else sometimes even detach tier can reproduce the issue
--- Additional comment from nchilaka on 2015-04-13 09:19:18 EDT ---
CLI executed logs:
==================
[root at yarrow glusterfs]# gluster v info disperse
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Created
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v start disperse
volume start: disperse: success
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: Commit failed on localhost. Please check the log file
for more details.
[root at yarrow glusterfs]# gluster v info disperse
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.
[root at yarrow glusterfs]# gluster v info disperse
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]# gluster v info disperse
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]#
--- Additional comment from nchilaka on 2015-04-13 09:20:02 EDT ---
sosreports at rhsqe-repo:/home/repo/sosreports/1211264
--- Additional comment from Dan Lambright on 2015-04-22 11:15:28 EDT ---
We have submitted fix 10108, which is not merged. The issues with detach-tier
may no longer exist (I do not see them). Returning to QE to retest.
--- Additional comment from nchilaka on 2015-04-28 03:07:55 EDT ---
Dan,
We are stilling seeing issues with glusted communication b/w nodes with a
tiered volume as of 28th April.
Kindly put it "ON_QA" only when the fix is availble for testing
--- Additional comment from Anand Avati on 2015-04-29 09:06:14 EDT ---
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#1) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-02 09:23:09 EDT ---
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-05 04:41:29 EDT ---
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-05 07:27:10 EDT ---
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#4) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1186580
[Bug 1186580] QE tracker bug for Everglades
https://bugzilla.redhat.com/show_bug.cgi?id=1199352
[Bug 1199352] GlusterFS 3.7.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1211264
[Bug 1211264] Data Tiering: glusterd(management) communication issues seen
on tiering setup
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list