[Gluster-devel] RENAME issues in Geo-replication

Aravinda avishwan at redhat.com
Wed Nov 12 06:04:26 UTC 2014


Updated the approaches to fix RENAME problems in geo-replication. Please 
let me know if you have any suggestions.

--
regards
Aravinda

On 09/19/2014 02:09 PM, Aravinda wrote:
> Hi All,
>
> Summarized the RENAME issues we have in geo-replication, feel free to 
> add if I missed any :)
>
> GlusterFS changelogs are stored in each brick, which records the 
> changes happened in the brick. Georep will run in all the nodes of 
> master and processes changelogs independently. Processing changelogs 
> is in brick level, but all the fops will be replayed on mount.
>
> In changelog internal fops are not recorded. For RENAME case only 
> RENAME is recorded in hashed brick changelog(DHT's Internal fops like 
> creating linkto file, unlink is not recorded)
>
> We need to start working on fixing these issues to stabilize the 
> Geo-replication. Comments and Suggestions welcome.
>
>
> Renamed file falls into other brick
> -----------------------------------
> Two bricks(distribute)
> CREATE f1
> RENAME f1 f2  -> f2 falls in other brick
>
> Now race between b1 and b2
>
> In b1 CREATE f1
>
> In b2 RENAME f1 f2
>
> Issue: Actually not an issue. Georep sends stat with RENAME entry ops, 
> if source itself is not their in slave then Georep will create the 
> target file using the stat.
> We have problem only when RENAME falls in other brick and file is 
> unlinked in master.
>
> Possible fix: ?
Fail(EEXIST) CREATE if any file exists with same GFID. If source and 
target file not exist then create the target file(Use default stat if 
stat is not available when file unlinked in master)
>
>
> Multiple Renames
> ----------------
>
> CREATE f1
> RENAME f1 f2
> RENAME f2 f1
>
> f1 falls in brick1 and f2 falls in brick2, changelogs are
>
> Brick1
> CREATE f1
> RENAME f2 f1
>
> Brick2
> RENAME f1 f2
>
> Issue: If Brick 1 changelogs executed first and then Brick 2, Slave 
> will have f2.
>
> Possible fix: ?
Same with last approach, along with the stat, send current_name in 
master volume for that GFID(may be using pathinfo xattr?), RENAME only 
if target file matches with current_name sent by master.
>
>
> Active Passive switch in georeplication
> ---------------------------------------
> Setup: Distribute Replica
>
> In any one of the replica,
> RENAME recorded in Passive brick, when Active brick was down. When 
> Active brick comes back it becomes active immediately.
>
> Passive Brick
> RENAME
>
> Active Brick
> MKNOD (From self heal traffic)
>
>
> Two issues:
> 1. If MKNOD is for sticky bit file, MKNOD will create sticky bit file 
> in slave(renamed file), old named file will be their. Two files with 
> same GFID, one old file and other one sticky bit file(target name).
>
> 2. If MKNOD is actual file, MKNOD will create new file in slave. Slave 
> will have old file as well as new file with same GFID.
>
> Possible Fix: If a node failed previously, do not become active, 
> continue with current Passive.(Don't know yet how to do this, as of 
> now depending on node-uuid we are deciding to become Active/Passive)
Kotresh is working on the new logic to choose Active node from replica 
pairs. With the new logic node will not participate in sync when comes 
back immediately.

>
>
> RENAME repeat - If two replica bricks are active
> ------------------------------------------------
>
> From one brick it processes,
> CREATE f1
> RENAME f1 f2
>
> From other brick it processes same changelogs again,
>
> CREATE f1
> RENAME f1 f2
>
> Issue: Slave will have both f1 and f2 with same GFID.
> Possible fix: modify MKNOD/CREATE to check disk gfid first and then 
> create the file. EEXIST when a file exists with same gfid but 
> different name.
Fail CREATE if a file exists with same GFID.

>
> -- 
> regards
> Aravinda
> http://aravindavk.in
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list