[Gluster-users] Geo-replication fails on self.slave.server.set_stime() with OSError: [Errno 2] No such file or directory

Morten Johansen morten at cerum.no
Fri Nov 7 20:23:29 UTC 2014


Hi, list

We’re having some issues with geo-replication, which I _think_ are related to delete operations.
Sometimes the replication goes into faulty state, and then after a while comes back again.
Changelog change detection fails, and it falls back to xsync. The slave volume does not replicate deleted files.

My research led me to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1073844

The bug lists a traceback which is very similar to the one we’re seeing in our logs.

We’re running version 3.5.2, which has this bug fix in it, and inspecting the master.py file on our actual servers confirms we do have this patch: http://review.gluster.org/#/c/7207/2/geo-replication/syncdaemon/master.py

In our case, something fails in the call on the line BEFORE the patched one, i.e. the call to self.slave.server.set_stime() on line 152 in master.py

This is an example traceback from our logs:
<SNIP>
[2014-11-07 12:47:07.516124] I [master(/media/slot2/geotest):1124:crawl] _GMaster: starting hybrid crawl...
[2014-11-07 12:47:07.518146] I [master(/media/slot2/geotest):1133:crawl] _GMaster: processing xsync changelog /var/run/gluster/geotest/ssh%3A%2F%2Froot%4010.32.0.101%3Agluster%3A%2F%2F127.0.0.1%3Ageotest/d531d53915b53c130ad434b5295ebf7c/xsync/XSYNC-CHANGELOG.1415360827
[2014-11-07 12:47:07.520725] E [syncdutils(/media/slot2/geotest):240:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
  main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542, in main_i
  local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1177, in service_loop
  g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 467, in crawlwrap
  self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1137, in crawl
  self.upd_stime(item[1][1], item[1][0])
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 884, in upd_stime
  self.sendmark(path, stime)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 658, in sendmark
  self.set_slave_xtime(path, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 152, in set_slave_xtime
  self.slave.server.set_stime(path, self.uuid, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1163, in <lambda>
  slave.server.set_stime = types.MethodType(lambda _self, path, uuid, mark: brickserver.set_stime(path, uuid + '.' + gconf.slave_id, mark), slave.server)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 299, in ff
  return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 496, in set_stime
  Xattr.lsetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'stime']), struct.pack('!II', *mark))
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 66, in lsetxattr
  cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr
  raise OSError(errn, os.strerror(errn))
OSError: [Errno 2] No such file or directory
[2014-11-07 12:47:07.522511] I [syncdutils(/media/slot2/geotest):192:finalize] <top>: exiting.
</SNIP>

Any ideas on this one? What breaks if I comment out line 152 too?
Any quick fixes on this would be much appreciated.

Best regards,

-- 
Morten Johansen
Systems developer, Cerum AS


More information about the Gluster-users mailing list