[Bugs] [Bug 1560957] After performing add-brick followed by replace-brick operation, brick went offline state

bugzilla at redhat.com bugzilla at redhat.com
Tue Mar 27 11:13:26 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1560957

Atin Mukherjee <amukherj at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |Triaged
             Status|NEW                         |ASSIGNED
             Blocks|                            |1560955
         Depends On|1560955                     |
           Assignee|bugs at gluster.org            |amukherj at redhat.com



--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
Description of problem:

On a three node cluster, Enable brick-mux and create a replica X3 volume. Stop
glusterd on node 3 and perform replace-brick on node 1. Replace brick succeeds,
now start the glusterd on the node 3. Now Perform add-brick (3 bricks) to the
volume. Add-brick succeeds, the brick on the node went offline.   

Version-Release number of selected component (if applicable):
mainline

How reproducible:
2/2

Steps to Reproduce:
1. Create a replica 3 volume and mount it. start io
2. Stop glusterd on one node(N3)
3. Perform replace brick operation on node (N1)
4. Start glusterd on node where it was stopped(N3)
5. Add 3 bricks to the volume, perform this operation on Node (N1)
6. One brick on node(N2) is offline 

Actual results:
Brick on node (N2) is offline

Expected results:
All bricks should be online in the volume


RCA:

glusterd maintains a boolean flag 'port_registered' which is used to determine
if a brick has completed its portmap sign in process. This flag is (re)set in
pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is
the identifier to determine if the very first brick with which the process is
spawned up has completed its sign in process. However in case of glusterd
restart when a brick is already identified as running, glusterd does a
pmap_registry_bind to ensure its portmap table is updated but this flag isn't
which is fine in case of non brick multiplex case but causes an issue given the
subsequent brick attach can depend on this flag. With replace-brick operation,
I think this is more visible as the brick to be replaced is first attached and
then the old brick is brought down, so there's eventually no provision for a
pmap_signin here as in brick multiplexing only for the very first brick the
pmap_signin happens.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1560955
[Bug 1560955] After performing add-brick followed by replace-brick
operation, brick went offline state
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list