[Bugs] [Bug 1323564] [scale] Brick process does not start after node reboot

Thu May 5 03:23:22 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1323564


--- Comment #7 from Vijay Bellur <vbellur at redhat.com> ---
COMMIT: http://review.gluster.org/14208 committed in release-3.7 by Raghavendra
G (rgowdapp at redhat.com) 
------
commit f9c59e29ccd770ae212da76b5e6f6ce3d8d09e61
Author: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>
Date:   Wed Apr 27 19:12:19 2016 +0530

    glusterd: add defence mechanism to avoid brick port clashes

    Intro:
    Currently glusterd maintain the portmap registry which contains ports that
    are free to use between 49152 - 65535, this registry is initialized
    once, and updated accordingly as an then when glusterd sees they are been
    used.

    Glusterd first checks for a port within the portmap registry and gets a
FREE
    port marked in it, then checks if that port is currently free using a
connect()
    function then passes it to brick process which have to bind on it.

    Problem:
    We see that there is a time gap between glusterd checking the port with
    connect() and brick process actually binding on it. In this time gap it
could
    be so possible that any process would have occupied this port because of
which
    brick will fail to bind and exit.

    Case 1:
    To avoid the gluster client process occupying the port supplied by glusterd
:

    we have separated the client port map range with brick port map range more
@
    http://review.gluster.org/#/c/13998/

    Case 2: (Handled by this patch)
    To avoid the other foreign process occupying the port supplied by glusterd
:

    To handle above situation this patch implements a mechanism to return
EADDRINUSE
    error code to glusterd, upon which a new port is allocated and try to
restart
    the brick process with the newly allocated port.

    Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way
to
    handle Case 2, becuase runner_run_nowait() will not wait to get the
return/exit
    code of the executed command (brick process). Hence as of now in such case,
    we cannot know with what error the brick has failed to connect.

    This patch also fix the runner_end() to perform some cleanup w.r.t
    return values.

    Backport of:
    > Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e
    > BUG: 1322805
    > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>
    > Reviewed-on: http://review.gluster.org/14043
    > Tested-by: Prasanna Kumar Kalever <pkalever at redhat.com>
    > Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
    > Smoke: Gluster Build System <jenkins at build.gluster.com>
    > NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    > Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
    > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>

    Change-Id: Ief247b4d4538c1ca03e73aa31beb5fa99853afd6
    BUG: 1323564
    Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>
    Reviewed-on: http://review.gluster.org/14208
    Tested-by: Prasanna Kumar Kalever <pkalever at redhat.com>
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=XFrofQJP7O&a=cc_unsubscribe