<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hi All,<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">We&#8217;ve got two six node RHEL 7.8 clusters and geo-replication would appear to be completely broken between them. I&#8217;ve deleted the session, removed &amp; recreated pem files, old changlogs/htime (after removing relevant options from volume) and
 completely set up geo-rep from scratch, but the new session comes up as Initializing, then goes faulty, and starts looping. Volume (on both sides) is a 4 x 2 disperse, running Gluster v6 (RH latest). &nbsp;Gsyncd reports:<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.701423] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker&nbsp;&nbsp; brick=/rhgs/brick20/brick&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; slave_node=bxts470194.eu.rabonet.com<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] Monitor: Worker would mount volume privately<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.757181] I [gsyncd(agent /rhgs/brick20/brick):318:main] &lt;top&gt;: Using session config file&nbsp;&nbsp;&nbsp; path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.758126] D [subcmds(agent /rhgs/brick20/brick):107:subcmd_agent] &lt;top&gt;: RPC FD&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rpc_fd='5,12,11,10'<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.758627] I [changelogagent(agent /rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent listining...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.764234] I [gsyncd(worker /rhgs/brick20/brick):318:main] &lt;top&gt;: Using session config file&nbsp;&nbsp; path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.779409] I [resource(worker /rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH connection between master and slave...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:14.841793] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068834.84 __repce_version__() ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.148725] D [repce(worker /rhgs/brick20/brick):215:__call__] RepceClient: call 6799:140380783982400:1593068834.84 __repce_version__ -&gt; 1.0<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.148911] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068836.15 version() ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.149574] D [repce(worker /rhgs/brick20/brick):215:__call__] RepceClient: call 6799:140380783982400:1593068836.15 version -&gt; 1.0<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.149735] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068836.15 pid() ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.150588] D [repce(worker /rhgs/brick20/brick):215:__call__] RepceClient: call 6799:140380783982400:1593068836.15 pid -&gt; 30703<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.150747] I [resource(worker /rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection between master and slave established.&nbsp;&nbsp;&nbsp;&nbsp; duration=1.3712<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.150819] I [resource(worker /rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster volume locally...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:16.265860] D [resource(worker /rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary glusterfs mount in place<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.272511] D [resource(worker /rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary glusterfs mount prepared<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.272708] I [resource(worker /rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster volume&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; duration=1.1218<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.272794] I [subcmds(worker /rhgs/brick20/brick):84:subcmd_worker] &lt;top&gt;: Worker spawn successful. Acknowledging back to monitor<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.272973] D [master(worker /rhgs/brick20/brick):104:gmaster_builder] &lt;top&gt;: setting up change detection mode mode=xsync<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor] Monitor: worker(/rhgs/brick20/brick) connected<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.273678] D [master(worker /rhgs/brick20/brick):104:gmaster_builder] &lt;top&gt;: setting up change detection mode mode=changelog<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.274224] D [master(worker /rhgs/brick20/brick):104:gmaster_builder] &lt;top&gt;: setting up change detection mode mode=changeloghistory<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.276484] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068837.28 version() ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.276916] D [repce(worker /rhgs/brick20/brick):215:__call__] RepceClient: call 6799:140380783982400:1593068837.28 version -&gt; 1.0<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.277009] D [master(worker /rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.277098] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068837.28 init() ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.292944] D [repce(worker /rhgs/brick20/brick):215:__call__] RepceClient: call 6799:140380783982400:1593068837.28 init -&gt; None<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:17.293097] D [repce(worker /rhgs/brick20/brick):195:push] RepceClient: call 6799:140380783982400:1593068837.29 register('/rhgs/brick20/brick', '/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick',
 '/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log', 8, 5) ...<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:19.296294] E [repce(agent /rhgs/brick20/brick):121:worker] &lt;top&gt;: call failed:<o:p></o:p></p>
<p class="MsoNormal">Traceback (most recent call last):<o:p></o:p></p>
<p class="MsoNormal">&nbsp; File &quot;/usr/libexec/glusterfs/python/syncdaemon/repce.py&quot;, line 117, in worker<o:p></o:p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; res = getattr(self.obj, rmeth)(*in_data[2:])<o:p></o:p></p>
<p class="MsoNormal">&nbsp; File &quot;/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py&quot;, line 40, in register<o:p></o:p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)<o:p></o:p></p>
<p class="MsoNormal">&nbsp; File &quot;/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py&quot;, line 46, in cl_register<o:p></o:p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; cls.raise_changelog_err()<o:p></o:p></p>
<p class="MsoNormal">&nbsp; File &quot;/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py&quot;, line 30, in raise_changelog_err<o:p></o:p></p>
<p class="MsoNormal">&nbsp;&nbsp;&nbsp; raise ChangelogException(errn, os.strerror(errn))<o:p></o:p></p>
<p class="MsoNormal">ChangelogException: [Errno 2] No such file or directory<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:19.297161] E [repce(worker /rhgs/brick20/brick):213:__call__] RepceClient: call failed&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; call=6799:140380783982400:1593068837.29 method=register error=ChangelogException<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:19.297338] E [resource(worker /rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog register failed&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; error=[Errno 2] No such file or directory<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:19.315074] I [repce(agent /rhgs/brick20/brick):96:service_loop] RepceServer: terminating on reaching EOF.<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase&nbsp;&nbsp;&nbsp;&nbsp; brick=/rhgs/brick20/brick<o:p></o:p></p>
<p class="MsoNormal">[2020-06-25 07:07:20.277383] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">We&#8217;ve done everything we can think of, including an &#8220;strace &#8211;f&#8221; on the pid, and we can&#8217;t really find anything. I&#8217;m about to lose the last of my hair over this, so does anyone have any ideas at all? We&#8217;ve even removed the entire slave vol
 and rebuilt it.<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
<p class="MsoNormal">Rob<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal" style="line-height:9.0pt;mso-line-height-rule:exactly"><b><span lang="EN-US" style="font-size:7.5pt;font-family:&quot;Verdana&quot;,sans-serif;color:navy;mso-fareast-language:EN-GB">Rob Quagliozzi<o:p></o:p></span></b></p>
<p class="MsoNormal" style="line-height:9.0pt;mso-line-height-rule:exactly"><b><span lang="EN-US" style="font-size:7.5pt;font-family:&quot;Verdana&quot;,sans-serif;color:navy;mso-fareast-language:EN-GB">Specialised Application Support</span></b><span lang="EN-US" style="font-size:7.5pt;font-family:&quot;Verdana&quot;,sans-serif;color:navy;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:9.0pt;mso-line-height-rule:exactly"><span lang="EN-US" style="mso-fareast-language:EN-GB"><br>
<br>
</span><span style="font-size:7.5pt;mso-fareast-language:EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<meta name="GENERATOR" content="MSHTML 11.00.10570.1001">
<p></p>
<hr>
This email (including any attachments to it) is confidential, legally privileged, subject to copyright and is sent for the personal attention of the intended recipient only. If you have received this email in error, please advise us immediately and delete it.
 You are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Although we have taken reasonable precautions to ensure no viruses are present in this email, we cannot accept
 responsibility for any loss or damage arising from the viruses in this email or attachments. We exclude any liability for the content of this email, or for the consequences of any actions taken on the basis of the information provided in this email or its
 attachments, unless that information is subsequently confirmed in writing. <span style="FONT-SIZE: 8px; TEXT-DECORATION: none; FONT-FAMILY: Verdana; FONT-VARIANT: normal; WHITE-SPACE: normal; WORD-SPACING: 0px; TEXT-TRANSFORM: none; FONT-WEIGHT: 400; COLOR: #000000; FONT-STYLE: normal; TEXT-ALIGN: left; ORPHANS: 2; LETTER-SPACING: normal; BACKGROUND-COLOR: transparent; TEXT-INDENT: 0px; -webkit-text-stroke-width: 0px">
<span style="FONT-SIZE: 8px; TEXT-DECORATION: none; FONT-FAMILY: Verdana; FONT-VARIANT: normal; WHITE-SPACE: normal; WORD-SPACING: 0px; TEXT-TRANSFORM: none; FONT-WEIGHT: 400; COLOR: #ffffff; FONT-STYLE: normal; TEXT-ALIGN: left; ORPHANS: 2; LETTER-SPACING: normal; BACKGROUND-COLOR: transparent; TEXT-INDENT: 0px; -webkit-text-stroke-width: 0px">&lt;#rbnl#1898i&gt;</span></span>
<hr>
<p></p>
<p>&nbsp;</p>
</body>
</html>