[Bugs] [Bug 1218922] New: [dist-geo-rep]:Directory not empty and Stale file handle errors in geo-rep logs during deletes from master in history/changelog crawl
bugzilla at redhat.com
bugzilla at redhat.com
Wed May 6 07:54:42 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1218922
Bug ID: 1218922
Summary: [dist-geo-rep]:Directory not empty and Stale file
handle errors in geo-rep logs during deletes from
master in history/changelog crawl
Product: GlusterFS
Version: 3.7.0
Component: geo-replication
Severity: urgent
Priority: high
Assignee: bugs at gluster.org
Reporter: avishwan at redhat.com
CC: aavati at redhat.com, avishwan at redhat.com,
bugs at gluster.org, csaba at redhat.com,
gluster-bugs at redhat.com, nlevinki at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
smanjara at redhat.com, storage-qa-internal at redhat.com,
vagarwal at redhat.com
Depends On: 1201732, 1211037
+++ This bug was initially created as a clone of Bug #1211037 +++
+++ This bug was initially created as a clone of Bug #1201732 +++
Description of problem:
Seeing Directory not empty and Stale file handle errors in geo-rep logs during
history crawl while deleting files. Geo-rep session went faulty.
Version-Release number of selected component (if applicable):
glusterfs-3.6.0.51-1.el6rhs.x86_64
How reproducible:
Always
Steps to Reproduce:
1. Create and start geo-rep session.
2. Create data(10k files) on the master mount.
3. Ensure files are synced to slave volume
4. Now start deleting the files. And while deletes are running stop geo-rep
session.
5. Once all the deletes are finished on master volume, start the geo-rep
session.
6. The session will switch to history crawl and start syncing to slave volume.
7. On the master node, check geo-rep slave.log and look for errors and
traceback.
Actual results:
Found errors like "Stale file handle:
'.gfid/cee167df-dfc0-406a-bd0d-ed593faeaa9c/5502b36a%%JZKEPVPX8R' and OSError:
[Errno 39] Directory not empty:
'.gfid/00000000-0000-0000-0000-000000000001/thread4'
However the data was synced to slave. But geo-rep session went to faulty state
but came back active after a while.
Expected results:
We should not see these traceback errors in the logs
Additional info:
[2015-03-13 15:48:23.847239] E
[repce(/bricks/brick0/master_brick0):207:__call__] RepceClient: call
16550:140374215104256:1426241903.22 (entry_ops) failed on peer with
OSError
[2015-03-13 15:48:23.847932] E
[syncdutils(/bricks/brick0/master_brick0):270:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in
crawlwrap
self.crawl(no_stime_update=no_stime_update)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in
crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in
process_change
self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 39] Directory not empty:
'.gfid/00000000-0000-0000-0000-000000000001/thread4'
[2015-03-13 15:48:39.740008] E
[repce(/bricks/brick0/master_brick0):207:__call__] RepceClient: call
16662:140258876471040:1426241919.52 (entry_ops) failed on peer with OSError
[2015-03-13 15:48:39.740557] E
[syncdutils(/bricks/brick0/master_brick0):270:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in
crawlwrap
self.crawl(no_stime_update=no_stime_update)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in
crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in
process_change
self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 116] Stale file handle:
'.gfid/cee167df-dfc0-406a-bd0d-ed593faeaa9c/5502b36a%%JZKEPVPX8R'
[2015-03-13 15:46:27.528390] I
[syncdutils(/bricks/brick0/master_brick0):214:finalize] <top>: exiting.
[2015-03-13 15:46:27.532459] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-03-13 15:46:27.532964] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-03-13 15:46:28.258347] I [monitor(monitor):280:monitor] Monitor:
worker(/bricks/brick0/master_brick0) died in startup phase
[2015-03-13 15:46:28.259366] I [monitor(monitor):146:set_state] Monitor: new
state: faulty
[2015-03-13 15:46:38.598536] I [monitor(monitor):220:monitor] Monitor:
------------------------------------------------------------
[2015-03-13 15:46:38.599097] I [monitor(monitor):221:monitor] Monitor: starting
gsyncd worker
[2015-03-13 15:46:38.887062] I
[gsyncd(/bricks/brick0/master_brick0):635:main_i] <top>: syncing:
gluster://localhost:master -> ssh://root@falcon:gluster://localhost:sla
ve
[2015-03-13 15:46:38.901046] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
# gluster v i
Volume Name: master
Type: Distributed-Replicate
Volume ID: 82b646f9-1f38-443f-862d-dadd24b668c5
Status: Started
Snap Volume: no
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: spitfire:/bricks/brick0/master_brick0
Brick2: mustang:/bricks/brick0/master_brick1
Brick3: harrier:/bricks/brick0/master_brick2
Brick4: typhoon:/bricks/brick0/master_brick3
Brick5: ccr:/bricks/brick0/master_brick4
Brick6: metallica:/bricks/brick0/master_brick5
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.quota: on
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
# gluster v geo-rep master falcon::slave status
MASTER NODE MASTER VOL MASTER BRICK SLAVE
USER SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------------------
spitfire master /bricks/brick0/master_brick0 root
falcon::slave Faulty N/A Changelog Crawl
typhoon master /bricks/brick0/master_brick3 root
lightning::slave Faulty N/A Changelog Crawl
metallica master /bricks/brick0/master_brick5 root
interceptor::slave Passive N/A N/A
mustang master /bricks/brick0/master_brick1 root
interceptor::slave Passive N/A N/A
harrier master /bricks/brick0/master_brick2 root
hornet::slave Passive N/A N/A
ccr master /bricks/brick0/master_brick4 root
falcon::slave Passive N/A N/A
--- Additional comment from shilpa on 2015-03-13 13:34:15 EDT ---
Also saw "OSError: [Errno 61] No data available" error for hardlinks:
[2015-03-13 22:47:46.456286] E
[syncdutils(/bricks/brick0/master_brick0):270:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in
crawlwrap
self.crawl(no_stime_update=no_stime_update)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in
crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in
process_change
self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 61] No data available:
'.gfid/b8c9ca6c-cb69-4930-b4d5-c50ca7710f66/hardlink_to_files/55030214%%8W5FSWYQIO'
[2015-03-13 22:47:46.459382] I
[syncdutils(/bricks/brick0/master_brick0):214:finalize] <top>: exiting.
[2015-03-13 22:47:46.464014] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-03-13 22:47:46.464807] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-03-13 22:47:46.887911] I [monitor(monitor):280:monitor] Monitor:
worker(/bricks/brick0/master_brick0) died in startup phase
--- Additional comment from Rahul Hinduja on 2015-03-16 04:45:12 EDT ---
Hit the same issue with just changelog crawl. Georep session was never stopped.
Executed following from fuse and nfs client of master volume:
Fuse:
====
[root at wingo master]# for i in {1..10}; do cp -rf /etc etc.1 ; sleep 10 ; rm -rf
etc.1 ; sleep 10 ; done
NFS:
====
[root at wingo master_nfs]# for i in {1..10}; do cp -rf /etc etc.2 ; sleep 10 ; rm
-rf etc.2 ; sleep 10 ; done
Status moved from ACTIVE to Faulty as:
======================================
[root at georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
STATUS CHECKPOINT STATUS CRAWL STATUS
----------------------------------------------------------------------------------------------------------------------------------
georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
Active N/A Changelog Crawl
georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
Active N/A Changelog Crawl
georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
Passive N/A N/A
georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1
Passive N/A N/A
[root at georep1 ~]#
[root at georep1 ~]#
[root at georep1 ~]#
[root at georep1 ~]#
[root at georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
STATUS CHECKPOINT STATUS CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------
georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
faulty N/A N/A
georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
faulty N/A N/A
georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
Passive N/A N/A
georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1
Passive N/A N/A
[root at georep1 ~]
>From faulty it went to active and changed the crawl to history crawl:
======================================================================
[root at georep1 ~]# gluster volume geo-replication vol0 10.70.46.100::vol1 status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
STATUS CHECKPOINT STATUS CRAWL STATUS
--------------------------------------------------------------------------------------------------------------------------------
georep1 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
Active N/A History Crawl
georep1 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
Active N/A History Crawl
georep3 vol0 /rhs/brick1/b1 root 10.70.46.101::vol1
Passive N/A N/A
georep3 vol0 /rhs/brick2/b2 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick1/b1 root 10.70.46.100::vol1
Passive N/A N/A
georep2 vol0 /rhs/brick2/b2 root 10.70.46.101::vol1
Passive N/A N/A
[root at georep1 ~]#
Log Snippet:
============
[2015-03-16 18:53:29.265461] I [syncdutils(/rhs/brick1/b1):214:finalize] <top>:
exiting.
[2015-03-16 18:53:29.264503] E
[syncdutils(/rhs/brick2/b2):270:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 645, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1329, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 553, in
crawlwrap
self.crawl(no_stime_update=no_stime_update)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1334, in
crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1017, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 981, in
process_change
self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 39] Directory not empty:
'.gfid/70b3b3b8-3e8d-4f32-9123-fab73574ce91/yum'
[2015-03-16 18:53:29.266381] I [syncdutils(/rhs/brick2/b2):214:finalize] <top>:
exiting.
[2015-03-16 18:53:29.268383] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-03-16 18:53:29.268708] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-03-16 18:53:29.269619] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-03-16 18:53:29.270034] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-03-16 18:53:29.685185] I [monitor(monitor):280:monitor] Monitor:
worker(/rhs/brick1/b1) died in startup phase
[2015-03-16 18:53:30.220942] I [monitor(monitor):280:monitor] Monitor:
worker(/rhs/brick2/b2) died in startup phase
Output of history log:
======================
[root at georep1
ssh%3A%2F%2Froot%4010.70.46.100%3Agluster%3A%2F%2F127.0.0.1%3Avol1]# find .
.
./c19b89ac45352ab8c894d210d136dd56
./c19b89ac45352ab8c894d210d136dd56/xsync
./c19b89ac45352ab8c894d210d136dd56/.history
./c19b89ac45352ab8c894d210d136dd56/.history/tracker
./c19b89ac45352ab8c894d210d136dd56/.history/.current
./c19b89ac45352ab8c894d210d136dd56/.history/.processed
./c19b89ac45352ab8c894d210d136dd56/.history/.processing
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512222
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512176
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512161
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512131
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512191
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512207
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512146
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512116
./c19b89ac45352ab8c894d210d136dd56/.history/.processing/CHANGELOG.1426512101
./c19b89ac45352ab8c894d210d136dd56/tracker
./c19b89ac45352ab8c894d210d136dd56/.current
./c19b89ac45352ab8c894d210d136dd56/.processed
./c19b89ac45352ab8c894d210d136dd56/.processed/archive_201503.tar
./c19b89ac45352ab8c894d210d136dd56/.processing
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512372
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512342
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512282
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512357
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512327
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512267
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512387
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512312
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512237
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512252
./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1426512297
./764586b145d7206a154a778f64bd2f50
./764586b145d7206a154a778f64bd2f50/xsync
./764586b145d7206a154a778f64bd2f50/.history
./764586b145d7206a154a778f64bd2f50/.history/tracker
./764586b145d7206a154a778f64bd2f50/.history/.current
./764586b145d7206a154a778f64bd2f50/.history/.processed
./764586b145d7206a154a778f64bd2f50/.history/.processing
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512222
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512176
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512161
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512131
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512207
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512146
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512192
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512116
./764586b145d7206a154a778f64bd2f50/.history/.processing/CHANGELOG.1426512101
./764586b145d7206a154a778f64bd2f50/tracker
./764586b145d7206a154a778f64bd2f50/.current
./764586b145d7206a154a778f64bd2f50/.processed
./764586b145d7206a154a778f64bd2f50/.processed/archive_201503.tar
./764586b145d7206a154a778f64bd2f50/.processing
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512372
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512342
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512282
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512357
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512327
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512267
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512387
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512312
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512237
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512252
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1426512297
[root at georep1
ssh%3A%2F%2Froot%4010.70.46.100%3Agluster%3A%2F%2F127.0.0.1%3Avol1]#
--- Additional comment from Anand Avati on 2015-04-12 12:49:54 EDT ---
REVIEW: http://review.gluster.org/10204 (geo-rep: Minimize rm -rf race in
Geo-rep) posted (#1) for review on master by Aravinda VK (avishwan at redhat.com)
--- Additional comment from Anand Avati on 2015-04-28 04:40:52 EDT ---
REVIEW: http://review.gluster.org/10204 (geo-rep: Minimize rm -rf race in
Geo-rep) posted (#2) for review on master by Aravinda VK (avishwan at redhat.com)
--- Additional comment from Anand Avati on 2015-05-05 09:45:07 EDT ---
COMMIT: http://review.gluster.org/10204 committed in master by Vijay Bellur
(vbellur at redhat.com)
------
commit 08107796c89f5f201b24d689ab6757237c743c0d
Author: Aravinda VK <avishwan at redhat.com>
Date: Sun Apr 12 17:46:45 2015 +0530
geo-rep: Minimize rm -rf race in Geo-rep
While doing RMDIR worker gets ENOTEMPTY because same directory will
have files from other bricks which are not deleted since that worker
is slow processing. So geo-rep does recursive_delete.
Recursive delete was done using shutil.rmtree. once started, it will
not check disk_gfid in between. So it ends up deleting the new files
created by other workers. Also if other worker creates files after one
worker gets list of files to be deleted, then first worker will again
get ENOTEMPTY again.
To fix these races, retry is added when it gets ENOTEMPTY/ESTALE/ENODATA.
And disk_gfid check added for original path for which recursive_delete is
called. This disk gfid check executed before every Unlink/Rmdir. If disk
gfid is not matching with GFID from Changelog, that means other worker
deleted the directory. Even if the subdir/file present, it belongs to
different parent. Exit without performing further deletes.
Retry on ENOENT during create is ignored, since if CREATE/MKNOD/MKDIR
failed with ENOENT will not succeed unless parent directory is created
again.
Rsync errors handling was handling unlinked_gfids_list only for one
Changelog, but when processed in batch it fails to detect unlinked_gfids
and retries again. Finally skips the entire Changelogs in that batch.
Fixed this issue by moving self.unlinked_gfids reset logic before batch
start and after batch end.
Most of the Geo-rep races with rm -rf is eliminated with this patch,
but in some cases stale directories left in some bricks and in mount
point we get ENOTEMPTY.(DHT issue, Error will be logged in Slave log)
BUG: 1211037
Change-Id: I8716b88e4c741545f526095bf789f7c1e28008cb
Signed-off-by: Aravinda VK <avishwan at redhat.com>
Reviewed-on: http://review.gluster.org/10204
Reviewed-by: Kotresh HR <khiremat at redhat.com>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1201732
[Bug 1201732] [dist-geo-rep]:Directory not empty and Stale file handle
errors in geo-rep logs during deletes from master in history/changelog
crawl
https://bugzilla.redhat.com/show_bug.cgi?id=1211037
[Bug 1211037] [dist-geo-rep]:Directory not empty and Stale file handle
errors in geo-rep logs during deletes from master in history/changelog
crawl
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list