[Bugs] [Bug 1297305] New: [GlusterD]: Peer detach happening with a node which is hosting volume bricks

bugzilla at redhat.com bugzilla at redhat.com
Mon Jan 11 07:01:27 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1297305

            Bug ID: 1297305
           Summary: [GlusterD]: Peer detach happening with a  node which
                    is hosting volume bricks
           Product: GlusterFS
           Version: 3.7.7
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: amukherj at redhat.com, bsrirama at redhat.com,
                    bugs at gluster.org, gluster-bugs at redhat.com,
                    nlevinki at redhat.com, rhs-bugs at redhat.com,
                    vbellur at redhat.com
        Depends On: 1293273, 1293414



+++ This bug was initially created as a clone of Bug #1293414 +++

+++ This bug was initially created as a clone of Bug #1293273 +++

Description of problem:
=======================
Had  a one node (Node-1) cluster with Distributed volume with one brick,
expanded the cluster and volume by adding brick of newly added node (node-2)
and again peer probed third node and tried to peer detach the second node
(node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen


Actual results:
===============
Peer detach happening with node hosting the volume bricks.


Expected results:
==================
Peer detach should not happen if node is hosting the bricks.



Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-12-21
04:38:48 EST ---

This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:38:57 EST ---

Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes
from the probing node (N1), but it still doesn't have information about the
other node (N2) about its membership (since peer update happens post volume
updates) and hence fail to update its brick's uuid. Post that even though N2
updates N3 about its membership the brick's uuid was never generated. Now as a
consequence when N3 initiates a detach of N2, it checks whether the node to be
detached has any bricks configured by its respective uuid which is NULL in this
case and hence it goes ahead and removes the peer which ideally it shouldn't
have.

I think we'd need to think about doing a peer list update first before volume
data to fix these types of inconsistencies which itself is a effort.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:39:26 EST ---

Given that its a complex fix, I'd prefer to mark it as a known issue for 3.1.2.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:47:51 EST ---

Another way of fixing it would be to import the uuid and just updating it
instead of resolving. Need to validate it though.

--- Additional comment from Vijay Bellur on 2015-12-21 12:49:52 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-22 10:12:12 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-28 12:10:35 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#3) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-30 00:28:39 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#4) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-01-05 02:21:32 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#5) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-01-07 22:51:49 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export
brickinfo->uuid) posted (#6) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-01-11 01:46:53 EST ---

COMMIT: http://review.gluster.org/13047 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit c449b7520c6f1ac6ea1bc4119dbbbe9ebb80bf93
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Mon Dec 21 23:13:43 2015 +0530

    glusterd: import/export brickinfo->uuid

    Given a two node cluster with node N1 & N2, if a dummy node N3 is peer
probed, the
    probed node N3  goes for importing volumes from the probing node (N1), but
    it still doesn't have information about the other node (N2) about its
membership
    (since peer update happens post volume updates) and hence fail to update
its
    brick's uuid. Post that even though N2 updates N3 about its membership the
    brick's uuid was never generated. Now as a consequence when N3 initiates a
    detach of N2, it checks whether the node to be detached has any bricks
    configured by its respective uuid which is NULL in this case and hence it
goes
    ahead and removes the peer which ideally it shouldn't have (refer to
    glusterd_friend_contains_vol_bricks () for the logic)

    Fix is to export brick's uuid and import it at the probed node instead of
    resolving it.

    Change-Id: I2d88c72175347550a45ab12aff0ae248e56baa87
    BUG: 1293414
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: http://review.gluster.org/13047
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Gaurav Kumar Garg <ggarg at redhat.com>
    Reviewed-by: Avra Sengupta <asengupt at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1293273
[Bug 1293273] [GlusterD]: Peer detach happening with a  node which is
hosting volume bricks
https://bugzilla.redhat.com/show_bug.cgi?id=1293414
[Bug 1293414] [GlusterD]: Peer detach happening with a  node which is
hosting volume bricks
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list