[Bugs] [Bug 1196632] dist-geo-rep: Concurrent renames and node reboots results in slave having both source and destination of file with destination being 0 byte sticky file

Mon Mar 16 04:54:03 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1196632


--- Comment #5 from Anand Avati <aavati at redhat.com> ---
COMMIT: http://review.gluster.org/9759 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit f0224ce93ae9ad420e23612fe6e6707a821f9cab
Author: Kotresh HR <khiremat at redhat.com>
Date:   Mon Feb 23 14:46:48 2015 +0530

    feature/geo-rep: Active Passive Switching logic flock

    CURRENT DESIGN AND ITS LIMITATIONS:
    -----------------------------------
    Geo-replication syncs changes across geography using changelogs captured
    by changelog translator. Changelog translator sits on server side just
    above posix translator. Hence, in distributed replicated setup, both
    replica pairs collect changelogs w.r.t their bricks. Geo-replication
    syncs the changes using only one brick among the replica pair at a time,
    calling it as "ACTIVE" and other non syncing brick as "PASSIVE".

    Let's consider below example of distributed replicated setup where
    NODE-1 as b1 and its replicated brick b1r is in NODE-2

            NODE-1                         NODE-2
              b1                            b1r

    At the beginning, geo-replication chooses to sync changes from NODE-1:b1
    and NODE-2:b1r will be "PASSIVE". The logic depends on virtual getxattr
    'trusted.glusterfs.node-uuid' which always returns first up subvolume
    i.e., NODE-1. When NODE-1 goes down, the above xattr returns NODE-2 and
    that is made 'ACTIVE'. But when NODE-1 comes back again, the above xattr
    returns NODE-1 and it is made 'ACTIVE' again. So for a brief interval of
    time, if NODE-2 had not finished processing the changelog, both NODE-2
    and NODE-1 will be ACTIVE causing rename race as mentioned in the bug.

    SOLUTION:
    ---------
    1. Have a shared replicated storage, a glusterfs management volume specific
       to geo-replication.

    2. Geo-rep creates a file per replica set on management volume.

    3. fcntl lock on the above said file is used for synchronization
       between geo-rep workers belonging to same replica set.

    4. If management volume is not configured, geo-replication will back
       to previous logic of using first up sub volume.

    Each worker tries to lock the file on shared storage, who ever wins will
    be ACTIVE. With this, we are able to solve the problem but there is an
    issue when the shared replicated storage goes down (when all replicas
    goes down). In that case, the lock state is lost. So AFR needs to rebuild
the
    lock state after brick comes up.

    NOTE:
    -----
    This patch brings in the, pre-requisite step of setting up management
volume
    for geo-replication during creation.

    1. Create mgmt-vol for geo-replicatoin and start it. Management volume
should
       be part of master cluster and recommended to be three way replicated
       volume having each brick in different nodes for availability.
    2. Create geo-rep session.
    3. Configure mgmt-vol created with geo-replication session as follows.
       gluster vol geo-rep <mastervol> slavenode::<slavevol> config meta_volume
\
       <meta-vol-name>
    4. Start geo-rep session.

    Backward Compatiability:
    -----------------------
    If management volume is not configured, it falls back to previous logic of
    using node-uuid virtual xattr. But it is not recommended.

    Change-Id: I7319d2289516f534b69edd00c9d0db5a3725661a
    BUG: 1196632
    Signed-off-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-on: http://review.gluster.org/9759
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=kjh1Xrjixw&a=cc_unsubscribe