<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Dear Users,<br>
the geo-replication is still broken. This is not really a
comfortable situation.<br>
Does any user has had the same experience and is able to share a
possible workaround?<br>
We are actually running gluster v6.0<br>
Regards,<br>
<p>Felix</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 25/06/2020 10:04, Shwetha Acharya
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAERh03pZY-xMfMtpVxk2uZwJwjsEMgZwMVte+SmuDVOGxKUxDQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi Rob and Felix,<br>
<br>
Please share the *-changes.log files and brick logs, which will
help in analysis of the issue.<br>
<br>
Regards,
<div>Shwetha</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 1:26
PM Felix Kölzow <<a href="mailto:felix.koelzow@gmx.de"
moz-do-not-send="true">felix.koelzow@gmx.de</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hey Rob,</p>
<p><br>
</p>
<p>same issue for our third volume. Have a look at the logs
just from right now (below).</p>
<p>Question: You removed the htime files and the old
changelogs. Just rm the files or is there something to pay
more attention</p>
<p>before removing the changelog files and the htime file.</p>
<p>Regards,</p>
<p>Felix<br>
</p>
<p>[2020-06-25 07:51:53.795430] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote]
SSH: SSH connection between master and slave
established. duration=1.2341<br>
[2020-06-25 07:51:53.795639] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1105:connect]
GLUSTER: Mounting gluster volume locally...<br>
[2020-06-25 07:51:54.520601] I
[monitor(monitor):280:monitor] Monitor: worker died in
startup phase
brick=/gluster/vg01/dispersed_fuse1024/brick<br>
[2020-06-25 07:51:54.535809] I
[gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty<br>
[2020-06-25 07:51:54.882143] I [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1128:connect]
GLUSTER: Mounted gluster volume duration=1.0864<br>
[2020-06-25 07:51:54.882388] I [subcmds(worker
/gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker]
<top>: Worker spawn successful. Acknowledging back
to monitor<br>
[2020-06-25 07:51:56.911412] E [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):121:worker]
<top>: call failed: <br>
Traceback (most recent call last):<br>
File
"/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
117, in worker<br>
res = getattr(self.obj, rmeth)(*in_data[2:])<br>
File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register<br>
return Changes.cl_register(cl_brick, cl_dir, cl_log,
cl_level, retries)<br>
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register<br>
cls.raise_changelog_err()<br>
File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err<br>
raise ChangelogException(errn, os.strerror(errn))<br>
ChangelogException: [Errno 2] No such file or directory<br>
[2020-06-25 07:51:56.912056] E [repce(worker
/gluster/vg00/dispersed_fuse1024/brick):213:__call__]
RepceClient: call failed
call=75086:140098349655872:1593071514.91
method=register error=ChangelogException<br>
[2020-06-25 07:51:56.912396] E [resource(worker
/gluster/vg00/dispersed_fuse1024/brick):1286:service_loop]
GLUSTER: Changelog register failed error=[Errno 2] No
such file or directory<br>
[2020-06-25 07:51:56.928031] I [repce(agent
/gluster/vg00/dispersed_fuse1024/brick):96:service_loop]
RepceServer: terminating on reaching EOF.<br>
[2020-06-25 07:51:57.886126] I
[monitor(monitor):280:monitor] Monitor: worker died in
startup phase
brick=/gluster/vg00/dispersed_fuse1024/brick<br>
[2020-06-25 07:51:57.895920] I
[gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty<br>
[2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change status=Passive<br>
[2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
/gluster/vg01/dispersed_fuse1024/brick):287:set_passive]
GeorepStatus: Worker Status Change status=Passive<br>
[2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
/gluster/vg00/dispersed_fuse1024/brick):281:set_active]
GeorepStatus: Worker Status Change status=Active<br>
<br>
</p>
<p><br>
</p>
<div>On 25/06/2020 09:15, <a
href="mailto:Rob.Quagliozzi@rabobank.com"
target="_blank" moz-do-not-send="true">Rob.Quagliozzi@rabobank.com</a>
wrote:<br>
</div>
<blockquote type="cite">
<div>
<p class="MsoNormal">Hi All,</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">We’ve got two six node RHEL 7.8
clusters and geo-replication would appear to be
completely broken between them. I’ve deleted the
session, removed & recreated pem files, old
changlogs/htime (after removing relevant options from
volume) and completely set up geo-rep from scratch,
but the new session comes up as Initializing, then
goes faulty, and starts looping. Volume (on both
sides) is a 4 x 2 disperse, running Gluster v6 (RH
latest). Gsyncd reports:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">[2020-06-25 07:07:14.701423] I
[gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change
status=Initializing...</p>
<p class="MsoNormal">[2020-06-25 07:07:14.701744] I
[monitor(monitor):159:monitor] Monitor: starting
gsyncd worker brick=/rhgs/brick20/brick
slave_node=<a href="http://bxts470194.eu.rabonet.com"
target="_blank" moz-do-not-send="true">bxts470194.eu.rabonet.com</a></p>
<p class="MsoNormal">[2020-06-25 07:07:14.707997] D
[monitor(monitor):230:monitor] Monitor: Worker would
mount volume privately</p>
<p class="MsoNormal">[2020-06-25 07:07:14.757181] I
[gsyncd(agent /rhgs/brick20/brick):318:main]
<top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf</p>
<p class="MsoNormal">[2020-06-25 07:07:14.758126] D
[subcmds(agent /rhgs/brick20/brick):107:subcmd_agent]
<top>: RPC FD rpc_fd='5,12,11,10'</p>
<p class="MsoNormal">[2020-06-25 07:07:14.758627] I
[changelogagent(agent
/rhgs/brick20/brick):72:__init__] ChangelogAgent:
Agent listining...</p>
<p class="MsoNormal">[2020-06-25 07:07:14.764234] I
[gsyncd(worker /rhgs/brick20/brick):318:main]
<top>: Using session config file
path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf</p>
<p class="MsoNormal">[2020-06-25 07:07:14.779409] I
[resource(worker
/rhgs/brick20/brick):1386:connect_remote] SSH:
Initializing SSH connection between master and
slave...</p>
<p class="MsoNormal">[2020-06-25 07:07:14.841793] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068834.84
__repce_version__() ...</p>
<p class="MsoNormal">[2020-06-25 07:07:16.148725] D
[repce(worker /rhgs/brick20/brick):215:__call__]
RepceClient: call 6799:140380783982400:1593068834.84
__repce_version__ -> 1.0</p>
<p class="MsoNormal">[2020-06-25 07:07:16.148911] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068836.15
version() ...</p>
<p class="MsoNormal">[2020-06-25 07:07:16.149574] D
[repce(worker /rhgs/brick20/brick):215:__call__]
RepceClient: call 6799:140380783982400:1593068836.15
version -> 1.0</p>
<p class="MsoNormal">[2020-06-25 07:07:16.149735] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068836.15
pid() ...</p>
<p class="MsoNormal">[2020-06-25 07:07:16.150588] D
[repce(worker /rhgs/brick20/brick):215:__call__]
RepceClient: call 6799:140380783982400:1593068836.15
pid -> 30703</p>
<p class="MsoNormal">[2020-06-25 07:07:16.150747] I
[resource(worker
/rhgs/brick20/brick):1435:connect_remote] SSH: SSH
connection between master and slave established.
duration=1.3712</p>
<p class="MsoNormal">[2020-06-25 07:07:16.150819] I
[resource(worker /rhgs/brick20/brick):1105:connect]
GLUSTER: Mounting gluster volume locally...</p>
<p class="MsoNormal">[2020-06-25 07:07:16.265860] D
[resource(worker /rhgs/brick20/brick):879:inhibit]
DirectMounter: auxiliary glusterfs mount in place</p>
<p class="MsoNormal">[2020-06-25 07:07:17.272511] D
[resource(worker /rhgs/brick20/brick):953:inhibit]
DirectMounter: auxiliary glusterfs mount prepared</p>
<p class="MsoNormal">[2020-06-25 07:07:17.272708] I
[resource(worker /rhgs/brick20/brick):1128:connect]
GLUSTER: Mounted gluster volume duration=1.1218</p>
<p class="MsoNormal">[2020-06-25 07:07:17.272794] I
[subcmds(worker /rhgs/brick20/brick):84:subcmd_worker]
<top>: Worker spawn successful. Acknowledging
back to monitor</p>
<p class="MsoNormal">[2020-06-25 07:07:17.272973] D
[master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>:
setting up change detection mode mode=xsync</p>
<p class="MsoNormal">[2020-06-25 07:07:17.273063] D
[monitor(monitor):273:monitor] Monitor:
worker(/rhgs/brick20/brick) connected</p>
<p class="MsoNormal">[2020-06-25 07:07:17.273678] D
[master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>:
setting up change detection mode mode=changelog</p>
<p class="MsoNormal">[2020-06-25 07:07:17.274224] D
[master(worker
/rhgs/brick20/brick):104:gmaster_builder] <top>:
setting up change detection mode mode=changeloghistory</p>
<p class="MsoNormal">[2020-06-25 07:07:17.276484] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068837.28
version() ...</p>
<p class="MsoNormal">[2020-06-25 07:07:17.276916] D
[repce(worker /rhgs/brick20/brick):215:__call__]
RepceClient: call 6799:140380783982400:1593068837.28
version -> 1.0</p>
<p class="MsoNormal">[2020-06-25 07:07:17.277009] D
[master(worker
/rhgs/brick20/brick):777:setup_working_dir] _GMaster:
changelog working dir
/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick</p>
<p class="MsoNormal">[2020-06-25 07:07:17.277098] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068837.28
init() ...</p>
<p class="MsoNormal">[2020-06-25 07:07:17.292944] D
[repce(worker /rhgs/brick20/brick):215:__call__]
RepceClient: call 6799:140380783982400:1593068837.28
init -> None</p>
<p class="MsoNormal">[2020-06-25 07:07:17.293097] D
[repce(worker /rhgs/brick20/brick):195:push]
RepceClient: call 6799:140380783982400:1593068837.29
register('/rhgs/brick20/brick',
'/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick',
'/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log',
8, 5) ...</p>
<p class="MsoNormal">[2020-06-25 07:07:19.296294] E
[repce(agent /rhgs/brick20/brick):121:worker]
<top>: call failed:</p>
<p class="MsoNormal">Traceback (most recent call last):</p>
<p class="MsoNormal"> File
"/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 117, in worker</p>
<p class="MsoNormal"> res = getattr(self.obj,
rmeth)(*in_data[2:])</p>
<p class="MsoNormal"> File
"/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
line 40, in register</p>
<p class="MsoNormal"> return
Changes.cl_register(cl_brick, cl_dir, cl_log,
cl_level, retries)</p>
<p class="MsoNormal"> File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 46, in cl_register</p>
<p class="MsoNormal"> cls.raise_changelog_err()</p>
<p class="MsoNormal"> File
"/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
line 30, in raise_changelog_err</p>
<p class="MsoNormal"> raise ChangelogException(errn,
os.strerror(errn))</p>
<p class="MsoNormal">ChangelogException: [Errno 2] No
such file or directory</p>
<p class="MsoNormal">[2020-06-25 07:07:19.297161] E
[repce(worker /rhgs/brick20/brick):213:__call__]
RepceClient: call failed
call=6799:140380783982400:1593068837.29
method=register error=ChangelogException</p>
<p class="MsoNormal">[2020-06-25 07:07:19.297338] E
[resource(worker
/rhgs/brick20/brick):1286:service_loop] GLUSTER:
Changelog register failed error=[Errno 2] No such
file or directory</p>
<p class="MsoNormal">[2020-06-25 07:07:19.315074] I
[repce(agent /rhgs/brick20/brick):96:service_loop]
RepceServer: terminating on reaching EOF.</p>
<p class="MsoNormal">[2020-06-25 07:07:20.275701] I
[monitor(monitor):280:monitor] Monitor: worker died in
startup phase brick=/rhgs/brick20/brick</p>
<p class="MsoNormal">[2020-06-25 07:07:20.277383] I
[gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">We’ve done everything we can think
of, including an “strace –f” on the pid, and we can’t
really find anything. I’m about to lose the last of my
hair over this, so does anyone have any ideas at all?
We’ve even removed the entire slave vol and rebuilt
it.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Thanks</p>
<p class="MsoNormal">Rob</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal" style="line-height:9pt"><b><span
style="font-size:7.5pt;font-family:Verdana,sans-serif;color:navy"
lang="EN-US">Rob Quagliozzi</span></b></p>
<p class="MsoNormal" style="line-height:9pt"><b><span
style="font-size:7.5pt;font-family:Verdana,sans-serif;color:navy"
lang="EN-US">Specialised Application Support</span></b><span
style="font-size:7.5pt;font-family:Verdana,sans-serif;color:navy"
lang="EN-US"></span></p>
<p class="MsoNormal" style="line-height:9pt"><span
lang="EN-US"><br>
<br>
</span><span style="font-size:7.5pt"></span></p>
<p class="MsoNormal"> </p>
</div>
<hr> This email (including any attachments to it) is
confidential, legally privileged, subject to copyright and
is sent for the personal attention of the intended
recipient only. If you have received this email in error,
please advise us immediately and delete it. You are
notified that disclosing, copying, distributing or taking
any action in reliance on the contents of this information
is strictly prohibited. Although we have taken reasonable
precautions to ensure no viruses are present in this
email, we cannot accept responsibility for any loss or
damage arising from the viruses in this email or
attachments. We exclude any liability for the content of
this email, or for the consequences of any actions taken
on the basis of the information provided in this email or
its attachments, unless that information is subsequently
confirmed in writing. <span
style="font-size:8px;text-decoration:none;font-family:Verdana;font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;font-weight:400;color:rgb(0,0,0);font-style:normal;text-align:left;letter-spacing:normal;background-color:transparent;text-indent:0px">
<span
style="font-size:8px;text-decoration:none;font-family:Verdana;font-variant:normal;white-space:normal;word-spacing:0px;text-transform:none;font-weight:400;color:rgb(255,255,255);font-style:normal;text-align:left;letter-spacing:normal;background-color:transparent;text-indent:0px"><#rbnl#1898i></span></span>
<hr>
<p> </p>
<br>
<fieldset></fieldset>
<pre>________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: <a href="https://bluejeans.com/441850968" target="_blank" moz-do-not-send="true">https://bluejeans.com/441850968</a>
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank" moz-do-not-send="true">Gluster-users@gluster.org</a>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a>
</pre>
</blockquote>
</div>
________<br>
<br>
<br>
<br>
Community Meeting Calendar:<br>
<br>
Schedule -<br>
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
Bridge: <a href="https://bluejeans.com/441850968"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank"
moz-do-not-send="true">Gluster-users@gluster.org</a><br>
<a
href="https://lists.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: <a class="moz-txt-link-freetext" href="https://bluejeans.com/441850968">https://bluejeans.com/441850968</a>
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a>
</pre>
</blockquote>
</body>
</html>