[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

Mon Apr 10 20:45:54 UTC 2017

Hi Kotresh,

I am using the official Debian 8 (jessie) package which has rsync version 3.1.1.

Regards,
M.

-------- Original Message --------
Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
Local Time: April 10, 2017 6:33 AM
UTC Time: April 10, 2017 4:33 AM
From: khiremat at redhat.com
To: mabi <mabi at protonmail.ch>
Gluster Users <gluster-users at gluster.org>

Hi Mabi,

What's the rsync version being used?

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "mabi" <mabi at protonmail.ch>
> To: "Gluster Users" <gluster-users at gluster.org>
> Sent: Saturday, April 8, 2017 4:20:25 PM
> Subject: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
>
> Hello,
>
> I am using distributed geo replication with two of my GlusterFS 3.7.20
> replicated volumes and just noticed that the geo replication for one volume
> is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried
> to stop and restart geo replication but still it stays stuck at that
> specific date and time under the DATA field of the geo replication "status
> detail" command I can see 3879 and that it has "Active" as STATUS but still
> nothing happens. I noticed that the rsync process is running but does not do
> anything, then I did a strace on the PID of rsync and saw the following:
>
> write(2, "rsync: link_stat \"(unreachable)/"..., 114
>
> It looks like rsync can't read or find a file and stays stuck on that. In the
> geo-replication log files of GlusterFS master I can't find any error
> messages just informational message. For example when I restart the geo
> replication I see the following log entries:
>
> [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>: slave
> bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}]
> [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>:
> worker specs: [('/data/private/brick', 'ssh:// root at gfs1geo.domain
> :gluster://localhost:private-geo', '1', False)]
> [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:
> ------------------------------------------------------------
> [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:
> starting gsyncd worker
> [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i]
> <top>: syncing: gluster://localhost:private -> ssh:// root at gfs1geo.domain
> :gluster://localhost:private-geo
> [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]
> ChangelogAgent: Agent listining...
> [2017-04-07 21:43:08.558648] I
> [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync
> change detection mode
> [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:08.560163] I
> [master(/data/private/brick):83:gmaster_builder] <top>: setting up changelog
> change detection mode
> [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:08.561105] I
> [master(/data/private/brick):83:gmaster_builder] <top>: setting up
> changeloghistory change detection mode
> [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register]
> _GMaster: xsync temp directory:
> /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> [2017-04-07 21:43:11.354751] I
> [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time:
> 1491601391
> [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap]
> _GMaster: primary master with volume id e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> ...
> [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap]
> _GMaster: crawl interval: 1 seconds
> [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl]
> _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), etime:
> 1491601391
> [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl]
> _GMaster: slave's time: (1487885974, 0)
>
> Does anyone know how I can find out the root cause of this problem and make
> geo replication work again from the time point it got stuck?
>
> Many thanks in advance for your help.
>
> Best regards,
> Mabi
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170410/da9bfc46/attachment.html>