[Bugs] [Bug 1218922] [dist-geo-rep]:Directory not empty and Stale file handle errors in geo-rep logs during deletes from master in history/changelog crawl

Wed May 6 17:02:38 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1218922


--- Comment #2 from Anand Avati <aavati at redhat.com> ---
COMMIT: http://review.gluster.org/10599 committed in release-3.7 by Vijay
Bellur (vbellur at redhat.com) 
------
commit 7ca09c3d23f224efd139b372a88788c7cbe90522
Author: Aravinda VK <avishwan at redhat.com>
Date:   Sun Apr 12 17:46:45 2015 +0530

    geo-rep: Minimize rm -rf race in Geo-rep

    While doing RMDIR worker gets ENOTEMPTY because same directory will
    have files from other bricks which are not deleted since that worker
    is slow processing. So geo-rep does recursive_delete.

    Recursive delete was done using shutil.rmtree. once started, it will
    not check disk_gfid in between. So it ends up deleting the new files
    created by other workers. Also if other worker creates files after one
    worker gets list of files to be deleted, then first worker will again
    get ENOTEMPTY again.

    To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
    And disk_gfid check added for original path for which recursive_delete is
    called. This disk gfid check executed before every Unlink/Rmdir. If disk
    gfid is not matching with GFID from Changelog, that means other worker
    deleted the directory. Even if the subdir/file present, it belongs to
    different parent. Exit without performing further deletes.

    Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
    failed with ENOENT will not succeed unless parent directory is created
    again.

    Rsync errors handling was handling unlinked_gfids_list only for one
    Changelog, but when processed in batch it fails to detect unlinked_gfids
    and retries again. Finally skips the entire Changelogs in that batch.
    Fixed this issue by moving self.unlinked_gfids reset logic before batch
    start and after batch end.

    Most of the Geo-rep races with rm -rf is eliminated with this patch,
    but in some cases stale directories left in some bricks and in mount
    point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)

    BUG: 1218922
    Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
    Signed-off-by: Aravinda VK <avishwan at redhat.com>
    Reviewed-on: http://review.gluster.org/10204
    Reviewed-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>
    Reviewed-on: http://review.gluster.org/10599
    Tested-by: Gluster Build System <jenkins at build.gluster.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.