[Bugs] [Bug 1462447] brick maintenance - no client reconnect

bugzilla at redhat.com bugzilla at redhat.com
Tue Jul 11 13:25:35 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1462447



--- Comment #11 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: https://review.gluster.org/17733 committed in release-3.8 by Niels de
Vos (ndevos at redhat.com) 
------
commit 9d5328e0e07353501392b33e9553300bd34a32ad
Author: Niels de Vos <ndevos at redhat.com>
Date:   Mon Jul 10 09:45:02 2017 +0200

    rpc/clnt: remove locks while notifying CONNECT/DISCONNECT

    Locking during notify was introduced as part of commit
    aa22f24f5db7659387704998ae01520708869873 [1]. The fix was introduced
    to fix out-of-order CONNECT/DISCONNECT events from rpc-clnt to parent
    xlators [2]. However as part of handling DISCONNECT protocol/client
    does unwind saved frames (with failure) waiting for responses. This
    saved_frames_unwind can be a costly operation and hence ideally
    shouldn't be included in the critical section of notifylock, as it
    unnecessarily delays the reconnection to same brick. Also, its not a
    good practise to pass control to other xlators holding a lock as it
    can lead to deadlocks. So, this patch removes locking in rpc-clnt
    while notifying parent xlators.

    To fix [2], two changes are present in this patch:

    * notify DISCONNECT before cleaning up rpc connection (same as commit
      a6b63e11b7758cf1bfcb6798, patch [3]).
    * protocol/client uses rpc_clnt_cleanup_and_start, which cleans up rpc
      connection and does a start while handling a DISCONNECT event from
      rpc. Note that patch [3] was reverted as rpc_clnt_start called in
      quick_reconnect path of protocol/client didn't invoke connect on
      transport as the connection was not cleaned up _yet_ (as cleanup was
      moved post notification in rpc-clnt). This resulted in clients never
      attempting connect to bricks.

    Note that one of the neater ways to fix [2] (without using locks) is
    to introduce generation numbers to map CONNECT and DISCONNECTS across
    epochs and ignore DISCONNECT events if they don't belong to current
    epoch. However, this approach is a bit complex to implement and
    requires time. So, current patch is a hacky stop-gap fix till we come
    up with a more cleaner solution.

    [1] http://review.gluster.org/15916
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=1386626
    [3] http://review.gluster.org/15681

    Cherry picked from commit 773f32caf190af4ee48818279b6e6d3c9f2ecc79:
    > Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
    > Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
    > BUG: 1427012
    > Reviewed-on: https://review.gluster.org/16784
    > Smoke: Gluster Build System <jenkins at build.gluster.org>
    > Reviewed-by: Milind Changire <mchangir at redhat.com>
    > NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins at build.gluster.org>

    Change-Id: I62daeee8bb1430004e28558f6eb133efd4ccf418
    Reported-by: Markus Stockhausen <mst at collogia.de>
    Signed-off-by: Niels de Vos <ndevos at redhat.com>
    BUG: 1462447
    Reviewed-on: https://review.gluster.org/17733
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Milind Changire <mchangir at redhat.com>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Wbn1PVY3RY&a=cc_unsubscribe


More information about the Bugs mailing list