[Bugs] [Bug 1448416] Halo Replication feature for AFR translator

bugzilla at redhat.com bugzilla at redhat.com
Fri May 5 13:14:19 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1448416



--- Comment #2 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: https://review.gluster.org/17192 committed in release-3.11 by Kaushal M
(kaushal at redhat.com) 
------
commit b6cc5261d5809aa509eecd082aefb7a0a14ca74b
Author: Kevin Vigor <kvigor at fb.com>
Date:   Tue Mar 21 08:23:25 2017 -0700

    Halo Replication feature for AFR translator

        Backport of https://review.gluster.org/16177
                https://review.gluster.org/17174

    Merged both these patches to make sure IPV6 changes don't make it to 3.11
at all.

    Summary:
    Halo Geo-replication is a feature which allows Gluster or NFS clients to
write
    locally to their region (as defined by a latency "halo" or threshold if you
    like), and have their writes asynchronously propagate from their origin to
the
    rest of the cluster.  Clients can also write synchronously to the cluster
    simply by specifying a halo-latency which is very large (e.g. 10seconds)
which
    will include all bricks.

    In other words, it allows clients to decide at mount time if they desire
    synchronous or asynchronous IO into a cluster and the cluster can support
both
    of these modes to any number of clients simultaneously.

    There are a few new volume options due to this feature:
      halo-shd-latency:  The threshold below which self-heal daemons will
      consider children (bricks) connected.

      halo-nfsd-latency: The threshold below which NFS daemons will consider
      children (bricks) connected.

      halo-latency: The threshold below which all other clients will
      consider children (bricks) connected.

      halo-min-replicas: The minimum number of replicas which are to
      be enforced regardless of latency specified in the above 3 options.
      If the number of children falls below this threshold the next
      best (chosen by latency) shall be swapped in.

    New FUSE mount options:
      halo-latency & halo-min-replicas: As descripted above.

    This feature combined with multi-threaded SHD support (D1271745) results in
    some pretty cool geo-replication possibilities.

    Operational Notes:
    - Global consistency is gaurenteed for synchronous clients, this is
provided by
      the existing entry-locking mechanism.
    - Asynchronous clients on the other hand and merely consistent to their
region.
      Writes & deletes will be protected via entry-locks as usual preventing
      concurrent writes into files which are undergoing replication.  Read
operations
      on the other hand should never block.
    - Writes are allowed from _any_ region and propagated from the origin to
all
      other regions.  The take away from this is care should be taken to ensure
      multiple writers do not write the same files resulting in a gfid
split-brain
      which will require resolution via split-brain policies (majority, mtime &
      size).  Recommended method for preventing this is using the nfs-auth
feature to
      define which region for each share has RW permissions, tiers not in the
origin
      region should have RO perms.

    TODO:
    - Synchronous clients (including the SHD) should choose clients from their
own
      region as preferred sources for reads.  Most of the plumbing is in place
for
      this via the child_latency array.
    - Better GFID split brain handling & better dent type split brain handling
      (i.e. create a trash can and move the offending files into it).
    - Tagging in addition to latency as a means of defining which children you
wish
      to synchronously write to

    Test Plan:
    - The usual suspects, clang, gcc w/ address sanitizer & valgrind
    - Prove tests

    Reviewers: jackl, dph, cjh, meyering

    Reviewed By: meyering

    Subscribers: ethanr

    Differential Revision: https://phabricator.fb.com/D1272053

    Tasks: 4117827

     >Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
     >BUG: 1428061
     >Signed-off-by: Kevin Vigor <kvigor at fb.com>
     >Reviewed-on: http://review.gluster.org/16099
     >Reviewed-on: https://review.gluster.org/16177
     >Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
     >Smoke: Gluster Build System <jenkins at build.gluster.org>
     >NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
     >Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>

    BUG: 1448416
    Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: https://review.gluster.org/17192
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Kaushal M <kaushal at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list