[Gluster-devel] Geo-replication: Resolving GFID Conflict

Aravinda avishwan at redhat.com
Fri Jul 17 09:12:31 UTC 2015


GFID conflict is common problem Geo-replication faces today. Reasons
for GFID conflicts are,

1. ignore_deletes option is set in Geo-replication. If files/dirs
    created, deleted and created again. Slave volume will have the file
    with Old GFID because ignore_deletes is set. When Geo-rep tries to
    sync new file, Fails with GFID conflict.
2. Files copied to Slave Volume from Master Volume using external
    tools other than Geo-replication. GFID will be different for same
    file in Master and Slave Volume.
3. Unlink of a file is failed, and same file is created again.
4. Rename failures. If old named file still exists in Slave Volume and
    same file(Name same as Old name) is created again in Master Volume.
5. Pre existing Slave Volume with data same as Master but synced via
    external tools.
6. Cluster Failover and Failback. When Slave Volume becomes Master,
    IOs on Slave volume can change the GFID of the files.
7. Files edited in Slave Volume.(For example, open in vi editor, edit,
    save and close)

If we add intelligence to Geo-rep for auto resolving GFID conflicts
then

1. Rsync/Tarssh will not fail and skip with error 23.
2. Master Volume and Slave Volume will be in Sync.


How to fix
==========
gfid-heal
---------
To solve this problem, we need to add gfid-heal capabilities to
Geo-replication.


During create entry in Slave Volume, if fails with GFID conflict,

0. Entry Creation on Slave Volume gets EEXIST and disk GFID is not
    same as GFID from Changelog.
1. Check that PGFID/basename exists in Master Volume
2. If not exists, ignore
3. If exists, Compare GFID from Changelog with disk GFID. If both
    GFID are same then, Send GFID heal request to Slave
4. If GFID on disk is not same as Changelog GFID then ignore.


Archive it
----------
Vijay suggested to archive the conflict file instead of healing the
GFID.

0. Entry Creation on Slave Volume gets EEXIST and disk GFID is not
    same as GFID from Changelog.
1. Check that PGFID/basename exists in Master Volume
2. If not exists, ignore
3. If exists, Compare GFID from Changelog with disk GFID. If both
    GFID are same then rename the conflicted file/directory to
    .gfid_conflicts directory in mount. Add Timestamp to the moved file.
4. If GFID on disk is not same as Changelog GFID then ignore.



Second approach looks more cleaner and old files will be archived and
not overwritten as in the first approach. Admin can periodically look
in the .gfid_conflicts directory and cleanup the files/dirs.


Challenges
----------
1. Race between AFR self-heal and RENAME of directory. (BZ 1240333)


Let me know your thoughts. Thanks.

-- 
regards
Aravinda



More information about the Gluster-devel mailing list