[Bugs] [Bug 1417042] New: glusterd restart is starting the offline shd daemon on other node in the cluster

Fri Jan 27 05:05:16 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1417042

            Bug ID: 1417042
           Summary: glusterd restart is starting the offline shd daemon on
                    other node in the cluster
           Product: GlusterFS
           Version: 3.10
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: bsrirama at redhat.com, bugs at gluster.org,
                    rhs-bugs at redhat.com, sasundar at redhat.com,
                    storage-qa-internal at redhat.com, vbellur at redhat.com
        Depends On: 1383893
            Blocks: 1381825

+++ This bug was initially created as a clone of Bug #1383893 +++

+++ This bug was initially created as a clone of Bug #1381825 +++

Description of problem:
=======================

glusterd restart on one of the cluster node is restarting the offline selh heal
daemon on other cluster node.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2

How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have 3 node cluster
2. Create 1*3 volume using both the node bricks and start it.
3. Kill shd daemon using kill -15 on of the cluster node
4. restart glusterd on other cluster node where step-3 is not done.
5. Now check for the volume status on any cluster node, you will see shd
running on the node where it was killed in step-3

Actual results:
===============
glusterd restart is starting the offline shd daemon on other node in the
cluster 

Expected results:
=================
glusterd restart should not start the offline shd daemon on other node in the
cluster.

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-10-05
02:54:14 EDT ---

This bug is automatically being proposed for the current release of Red Hat
Gluster Storage 3 under active development, by setting the release flag
'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Atin Mukherjee on 2016-10-12 01:10:22 EDT ---

RCA:

This is not a regression and has been there since server side quorum is
introduced. Unlike brick processes, daemon services are (re)started
irrespective of what the quorum state is. In this particular case, when
glusterd instance on N1 was brought down and shd service of N2 was explicitly
killed, upon restarting glusterd service on N1, N2 gets a friend update request
which calls glusterd_restart_bricks () and which eventually ends up spawning
the shd daemon. If the same reproducer is applied for one of the brick
processes, the brick doesn't come up as for bricks the logic is start the brick
processes only if the quorum is regained, otherwise skip it. To fix this
behaviour the other daemons should also follow the same logic like bricks.

--- Additional comment from Worker Ant on 2016-10-12 03:25:42 EDT ---

REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should
adhere server side quorum) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2016-10-13 01:55:51 EDT ---

REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should
adhere server side quorum) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Worker Ant on 2017-01-27 00:04:33 EST ---

COMMIT: https://review.gluster.org/15626 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit a9f660bc9d2d7c87b3306a35a2088532de000015
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Wed Oct 5 14:59:51 2016 +0530

    glusterd: daemon restart logic should adhere server side quorum

    Just like brick processes, other daemon services should also follow the
same
    logic of quorum checks to see if a particular service needs to come up if
    glusterd is restarted or the incoming friend add/update request is received
    (in glusterd_restart_bricks () function)

    Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3
    BUG: 1383893
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: https://review.gluster.org/15626
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Prashanth Pai <ppai at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1381825
[Bug 1381825] glusterd restart is starting the offline shd daemon on other
node in the cluster
https://bugzilla.redhat.com/show_bug.cgi?id=1383893
[Bug 1383893] glusterd restart is starting the offline shd daemon on other
node in the cluster
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.