[Gluster-devel] rm -rf issues in Geo-replication

Fri May 22 11:05:31 UTC 2015

----- Original Message -----
> From: "Aravinda" <avishwan at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Cc: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Sent: Friday, 22 May, 2015 12:42:11 PM
> Subject: rm -rf issues in Geo-replication
> 
> Problem:
> --------
> Each geo-rep workers process Changelogs available in their bricks,
> if worker sees RMDIR, it tries to remove that directory recursively.
> Since rmdir is recorded in all the bricks, rm -rf is executed
> in parallel.
> 
> Due to DHTs open issues of parallel rm -rf, Some of the directories
> will not get deleted in Slave Volume(Stale directory layout). If same
> named dir is created in Master, then Geo-rep will end up in inconsistent
> state since GFID is different for the new directory and directory
> exists in Slave.
> 
> 
> Solution - Fix in DHT:
> ---------------------
> Hold lock during rmdir, so that parallel rmdir will get blocked and
> no stale layouts.
> 
> 
> Solution - Fix in Geo-rep:
> --------------------------
> Temporarily we can fix in Geo-rep till DHT fixes this issue. Since
> Meta Volume is available with each Cluster, Geo-rep can keep lock
> for GFID of dir to be deleted.

If it fixes a currently pressing problem in geo-rep, we can use this. However, please note that the problem is directory self heal done during lookup racing with rmdir. So, theoretically, any path based operation (like stat, opendir, chmod, chown, etc and not just rmdir) can result in this bug. So, even with this solution you can see the issue (like doing find <dir> while doing rmdir <dir>). There is an age old patch supposed to resolve this issue at [1], but not merged because of various reasons (one being synchronization required to prevent stale layouts being stored in inode-ctx).

[1] http://review.gluster.org/#/c/4846/

> 
> For example,
> 
> when rmdir:
>      while True:
>          try:
> # fcntl lock in Meta volume $METAVOL/.rmdirlocks/<GFID>
>              get_lock(GFID)
>              recursive_delete()
>              release_and_del_lock_file()
>              break
>          except (EACCES, EAGAIN):
>              continue
> 
> One worker will succeed and all other workers will get ENOENT/ESTALE,
> which can be safely ignored.
> 
> 
> Let us know your thoughts.
> 
> --
> regards
> Aravinda
>