[Bugs] [Bug 1219846] New: Data Tiering: glusterd(management) communication issues seen on tiering setup

bugzilla at redhat.com bugzilla at redhat.com
Fri May 8 13:11:17 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1219846

            Bug ID: 1219846
           Summary: Data Tiering: glusterd(management) communication
                    issues seen on tiering setup
           Product: GlusterFS
           Version: 3.7.0
         Component: tiering
          Keywords: Triaged
          Severity: urgent
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: rkavunga at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org, dlambrig at redhat.com,
                    nchilaka at redhat.com, vagarwal at redhat.com
        Depends On: 1211264
            Blocks: 1186580 (qe_tracker_everglades), 1199352
                    (glusterfs-3.7.0)



+++ This bug was initially created as a clone of Bug #1211264 +++

Description of problem:
======================
While executing commands like quota on, attach-tier, detach tier etc on a
cluster
with one tiered volume atleast, there are errors observed like updating the
tables on other nodes of clusters.
Some examples are:
1)volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.

2) Sometimes when a command like detach tier or quota disable is issued on a
multi node cluster, the command gets executed only on the local node and fails
to get updated in the tables or graphs of other nodes.

We have seen this issue even on non tiered volume sometimes , but can be seen
after any tiering commands have been executed on that cluster.
There seems to be a issue with management deamon b/w nodes

In more detail, I issued a detach tier command from one node's cli, and
following is the o/p seen from both the nodes resepctive cli:

(local node, where i have been executing all the commands so far)
[root at rhs-client6 glusterd]# gluster v info disperse

Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse

[root at yarrow glusterd]# gluster v info disperse

Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse


It can be clearly seen that the other nodes havent been updated.


Version-Release number of selected component (if applicable):
============================================================
[root at rhs-client6 glusterd]# gluster --version
glusterfs 3.7dev built on Apr 13 2015 07:14:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.

[root at rhs-client6 glusterd]# rpm -qa|grep gluster
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64




How reproducible:
================
quite easily


Steps to Reproduce:
==================
1.Install latest nightly 
2.create a cluster with atleast two nodes
3.create a tired volume
4. try to enable and then disable quotas and we can see the issue
or else sometimes even detach tier can reproduce the issue

--- Additional comment from nchilaka on 2015-04-13 09:19:18 EDT ---

CLI executed logs:
==================
[root at yarrow glusterfs]# gluster v info disperse

Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Created
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v start disperse
volume start: disperse: success
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2 
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: Commit failed on localhost. Please check the log file
for more details.
[root at yarrow glusterfs]# gluster v info disperse

Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.
[root at yarrow glusterfs]# gluster v info disperse

Volume Name: disperse
Type: Distributed-Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2 
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2 
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]# gluster v info disperse

Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root at yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check
the log file for more details.
[root at yarrow glusterfs]# gluster v attach-tier disperse replica 2 
yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a
volume
[root at yarrow glusterfs]#

--- Additional comment from nchilaka on 2015-04-13 09:20:02 EDT ---

sosreports at rhsqe-repo:/home/repo/sosreports/1211264

--- Additional comment from Dan Lambright on 2015-04-22 11:15:28 EDT ---

We have submitted fix 10108, which is not merged. The issues with detach-tier
may no longer exist (I do not see them). Returning to QE to retest.

--- Additional comment from nchilaka on 2015-04-28 03:07:55 EDT ---

Dan,
We are stilling seeing issues with glusted communication b/w nodes with a
tiered volume as of 28th April.
Kindly put it "ON_QA" only when the fix is availble for testing

--- Additional comment from Anand Avati on 2015-04-29 09:06:14 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#1) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-02 09:23:09 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#2) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-05 04:41:29 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#3) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-05 07:27:10 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info
during glusted handshake) posted (#4) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1186580
[Bug 1186580] QE tracker bug for Everglades
https://bugzilla.redhat.com/show_bug.cgi?id=1199352
[Bug 1199352] GlusterFS 3.7.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1211264
[Bug 1211264] Data Tiering: glusterd(management) communication issues seen
on tiering setup
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list