[Gluster-users] Geo-replication fails on self.slave.server.set_stime() with OSError: [Errno 2] No such file or directory
Morten Johansen
morten at cerum.no
Fri Nov 7 20:23:29 UTC 2014
Hi, list
We’re having some issues with geo-replication, which I _think_ are related to delete operations.
Sometimes the replication goes into faulty state, and then after a while comes back again.
Changelog change detection fails, and it falls back to xsync. The slave volume does not replicate deleted files.
My research led me to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1073844
The bug lists a traceback which is very similar to the one we’re seeing in our logs.
We’re running version 3.5.2, which has this bug fix in it, and inspecting the master.py file on our actual servers confirms we do have this patch: http://review.gluster.org/#/c/7207/2/geo-replication/syncdaemon/master.py
In our case, something fails in the call on the line BEFORE the patched one, i.e. the call to self.slave.server.set_stime() on line 152 in master.py
This is an example traceback from our logs:
<SNIP>
[2014-11-07 12:47:07.516124] I [master(/media/slot2/geotest):1124:crawl] _GMaster: starting hybrid crawl...
[2014-11-07 12:47:07.518146] I [master(/media/slot2/geotest):1133:crawl] _GMaster: processing xsync changelog /var/run/gluster/geotest/ssh%3A%2F%2Froot%4010.32.0.101%3Agluster%3A%2F%2F127.0.0.1%3Ageotest/d531d53915b53c130ad434b5295ebf7c/xsync/XSYNC-CHANGELOG.1415360827
[2014-11-07 12:47:07.520725] E [syncdutils(/media/slot2/geotest):240:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542, in main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1177, in service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 467, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1137, in crawl
self.upd_stime(item[1][1], item[1][0])
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 884, in upd_stime
self.sendmark(path, stime)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 658, in sendmark
self.set_slave_xtime(path, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 152, in set_slave_xtime
self.slave.server.set_stime(path, self.uuid, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1163, in <lambda>
slave.server.set_stime = types.MethodType(lambda _self, path, uuid, mark: brickserver.set_stime(path, uuid + '.' + gconf.slave_id, mark), slave.server)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 299, in ff
return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 496, in set_stime
Xattr.lsetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'stime']), struct.pack('!II', *mark))
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 66, in lsetxattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 2] No such file or directory
[2014-11-07 12:47:07.522511] I [syncdutils(/media/slot2/geotest):192:finalize] <top>: exiting.
</SNIP>
Any ideas on this one? What breaks if I comment out line 152 too?
Any quick fixes on this would be much appreciated.
Best regards,
--
Morten Johansen
Systems developer, Cerum AS
More information about the Gluster-users
mailing list