[Gluster-users] Geo-rep failing
Adrian Carpenter
tac12 at wbic.cam.ac.uk
Tue Jun 28 10:04:19 UTC 2011
Thanks Csaba,
So far as I am aware nothing tampered with the xattrs, and all the bricks etc are time synchronised. Anyway I did as you suggest, now for one volume (I have three being geo-rep'd) I consistently get this:
OSError: [Errno 12] Cannot allocate memory
[2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
[2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
:2011-06-28 06:04:48.524348] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:app-volume -> file:///geo-tank/app-volume
[2011-06-28 06:04:54.480377] I [master:181:crawl] GMaster: new master is eb9f50ba-f17c-4109-ae87-4162925d1db2
[2011-06-28 06:04:54.480622] I [master:187:crawl] GMaster: primary master with volume id eb9f50ba-f17c-4109-ae87-4162925d1db2 ...
[2011-06-28 07:38:41.134073] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in main
main_i()
File "/opt/glusterfs/3.2.1/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 296, in main_i
local.service_loop(*[r for r in [remote] if r])
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in service_loop
GMaster(self, args[0]).crawl_loop()
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 113, in crawl_loop
self.crawl()
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
True)[-1], blame=e) == False:
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
return fnc(e)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
True)[-1], blame=e) == False:
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
return fnc(e)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 291, in crawl
True)[-1], blame=e) == False:
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 257, in indulgently
return fnc(e)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 289, in <lambda>
if indulgently(e, lambda e: (self.add_job(path, 'cwait', self.wait, e, xte, adct),
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 248, in crawl
xte = self.xtime(e)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py", line 57, in xtime
xt = rsc.server.xtime(path, self.uuid)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/resource.py", line 145, in xtime
return struct.unpack('!II', Xattr.lgetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8))
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lgetxattr
return cls._query_xattr( path, siz, 'lgetxattr', attr)
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _query_xattr
cls.raise_oserr()
File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory
[2011-06-28 07:38:51.194791] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
[2011-06-28 07:38:51.203562] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
Regards,
Adrian
On 27 Jun 2011, at 23:23, Csaba Henk wrote:
> This means that the geo-replication indexing ("xtime" extended attributes) has gone inconsistent. If these xattrs wasn't tampered with by an outside actor (ie. anything that is not the gsyncd process spawned upon the "geo-replication start", and its children), then this happens if the clock of the master box (more precisely, any brick which belongs to the master volume) is set backwards. In that case the whole indexing is gone corrupt and to fix it, you should reset the index with
>
> # gluster volume set <master volume> geo-replication.indexing off
> # gluster volume set <master volume> geo-replication.indexing on
>
> (for this you should first stop geo-rep sessions with <master volume> as master; they can be restarted after the index reset). The side effect of this operation is that a full rsync-style synchronization will be performed once, ie. files will be checked if match by means of a two-side checksum.
>
> Regards,
> Csaba
More information about the Gluster-users
mailing list