[Gluster-users] Geo-replication completely broken
Strahil Nikolov
hunter86_bg at yahoo.com
Fri Jul 3 12:54:52 UTC 2020
Hi Felix,
It seems I missed your reply with the change log that Shwetha requested.
Best Regards,
Strahil Nikolov
На 3 юли 2020 г. 11:16:30 GMT+03:00, "Felix Kölzow" <felix.koelzow at gmx.de> написа:
>Dear Users,
>the geo-replication is still broken. This is not really a comfortable
>situation.
>Does any user has had the same experience and is able to share a
>possible workaround?
>We are actually running gluster v6.0
>Regards,
>
>Felix
>
>
>On 25/06/2020 10:04, Shwetha Acharya wrote:
>> Hi Rob and Felix,
>>
>> Please share the *-changes.log files and brick logs, which will help
>> in analysis of the issue.
>>
>> Regards,
>> Shwetha
>>
>> On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow <felix.koelzow at gmx.de
>> <mailto:felix.koelzow at gmx.de>> wrote:
>>
>> Hey Rob,
>>
>>
>> same issue for our third volume. Have a look at the logs just
>from
>> right now (below).
>>
>> Question: You removed the htime files and the old changelogs.
>Just
>> rm the files or is there something to pay more attention
>>
>> before removing the changelog files and the htime file.
>>
>> Regards,
>>
>> Felix
>>
>> [2020-06-25 07:51:53.795430] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH:
>> SSH connection between master and slave established.
>> duration=1.2341
>> [2020-06-25 07:51:53.795639] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER:
>> Mounting gluster volume locally...
>> [2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor]
>> Monitor: worker died in startup phase
>> brick=/gluster/vg01/dispersed_fuse1024/brick
>> [2020-06-25 07:51:54.535809] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>Worker
>> Status Change status=Faulty
>> [2020-06-25 07:51:54.882143] I [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER:
>> Mounted gluster volume duration=1.0864
>> [2020-06-25 07:51:54.882388] I [subcmds(worker
>> /gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] <top>:
>> Worker spawn successful. Acknowledging back to monitor
>> [2020-06-25 07:51:56.911412] E [repce(agent
>> /gluster/vg00/dispersed_fuse1024/brick):121:worker] <top>: call
>> failed:
>> Traceback (most recent call last):
>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
>> 117, in worker
>> res = getattr(self.obj, rmeth)(*in_data[2:])
>> File
>> "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>line
>> 40, in register
>> return Changes.cl_register(cl_brick, cl_dir, cl_log,
>cl_level,
>> retries)
>> File
>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>line
>> 46, in cl_register
>> cls.raise_changelog_err()
>> File
>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>line
>> 30, in raise_changelog_err
>> raise ChangelogException(errn, os.strerror(errn))
>> ChangelogException: [Errno 2] No such file or directory
>> [2020-06-25 07:51:56.912056] E [repce(worker
>> /gluster/vg00/dispersed_fuse1024/brick):213:__call__]
>RepceClient:
>> call failed call=75086:140098349655872:1593071514.91
>> method=register error=ChangelogException
>> [2020-06-25 07:51:56.912396] E [resource(worker
>> /gluster/vg00/dispersed_fuse1024/brick):1286:service_loop]
>> GLUSTER: Changelog register failed error=[Errno 2] No such
>file
>> or directory
>> [2020-06-25 07:51:56.928031] I [repce(agent
>> /gluster/vg00/dispersed_fuse1024/brick):96:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor]
>> Monitor: worker died in startup phase
>> brick=/gluster/vg00/dispersed_fuse1024/brick
>> [2020-06-25 07:51:57.895920] I
>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>Worker
>> Status Change status=Faulty
>> [2020-06-25 07:51:58.607405] I [gsyncdstatus(worker
>> /gluster/vg00/dispersed_fuse1024/brick):287:set_passive]
>> GeorepStatus: Worker Status Change status=Passive
>> [2020-06-25 07:51:58.607768] I [gsyncdstatus(worker
>> /gluster/vg01/dispersed_fuse1024/brick):287:set_passive]
>> GeorepStatus: Worker Status Change status=Passive
>> [2020-06-25 07:51:58.608004] I [gsyncdstatus(worker
>> /gluster/vg00/dispersed_fuse1024/brick):281:set_active]
>> GeorepStatus: Worker Status Change status=Active
>>
>>
>> On 25/06/2020 09:15, Rob.Quagliozzi at rabobank.com
>> <mailto:Rob.Quagliozzi at rabobank.com> wrote:
>>>
>>> Hi All,
>>>
>>> We’ve got two six node RHEL 7.8 clusters and geo-replication
>>> would appear to be completely broken between them. I’ve deleted
>>> the session, removed & recreated pem files, old changlogs/htime
>>> (after removing relevant options from volume) and completely set
>>> up geo-rep from scratch, but the new session comes up as
>>> Initializing, then goes faulty, and starts looping. Volume (on
>>> both sides) is a 4 x 2 disperse, running Gluster v6 (RH
>latest).
>>> Gsyncd reports:
>>>
>>> [2020-06-25 07:07:14.701423] I
>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>>> Worker Status Change status=Initializing...
>>>
>>> [2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor]
>>> Monitor: starting gsyncd worker brick=/rhgs/brick20/brick
>>> slave_node=bxts470194.eu.rabonet.com
>>> <http://bxts470194.eu.rabonet.com>
>>>
>>> [2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor]
>>> Monitor: Worker would mount volume privately
>>>
>>> [2020-06-25 07:07:14.757181] I [gsyncd(agent
>>> /rhgs/brick20/brick):318:main] <top>: Using session config file
>>>
>path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
>>>
>>> [2020-06-25 07:07:14.758126] D [subcmds(agent
>>> /rhgs/brick20/brick):107:subcmd_agent] <top>: RPC FD
>>> rpc_fd='5,12,11,10'
>>>
>>> [2020-06-25 07:07:14.758627] I [changelogagent(agent
>>> /rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent
>listining...
>>>
>>> [2020-06-25 07:07:14.764234] I [gsyncd(worker
>>> /rhgs/brick20/brick):318:main] <top>: Using session config file
>>>
>path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf
>>>
>>> [2020-06-25 07:07:14.779409] I [resource(worker
>>> /rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH
>>> connection between master and slave...
>>>
>>> [2020-06-25 07:07:14.841793] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068834.84 __repce_version__() ...
>>>
>>> [2020-06-25 07:07:16.148725] D [repce(worker
>>> /rhgs/brick20/brick):215:__call__] RepceClient: call
>>> 6799:140380783982400:1593068834.84 __repce_version__ -> 1.0
>>>
>>> [2020-06-25 07:07:16.148911] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068836.15 version() ...
>>>
>>> [2020-06-25 07:07:16.149574] D [repce(worker
>>> /rhgs/brick20/brick):215:__call__] RepceClient: call
>>> 6799:140380783982400:1593068836.15 version -> 1.0
>>>
>>> [2020-06-25 07:07:16.149735] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068836.15 pid() ...
>>>
>>> [2020-06-25 07:07:16.150588] D [repce(worker
>>> /rhgs/brick20/brick):215:__call__] RepceClient: call
>>> 6799:140380783982400:1593068836.15 pid -> 30703
>>>
>>> [2020-06-25 07:07:16.150747] I [resource(worker
>>> /rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection
>>> between master and slave established. duration=1.3712
>>>
>>> [2020-06-25 07:07:16.150819] I [resource(worker
>>> /rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster
>>> volume locally...
>>>
>>> [2020-06-25 07:07:16.265860] D [resource(worker
>>> /rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary
>>> glusterfs mount in place
>>>
>>> [2020-06-25 07:07:17.272511] D [resource(worker
>>> /rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary
>>> glusterfs mount prepared
>>>
>>> [2020-06-25 07:07:17.272708] I [resource(worker
>>> /rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster
>>> volume duration=1.1218
>>>
>>> [2020-06-25 07:07:17.272794] I [subcmds(worker
>>> /rhgs/brick20/brick):84:subcmd_worker] <top>: Worker spawn
>>> successful. Acknowledging back to monitor
>>>
>>> [2020-06-25 07:07:17.272973] D [master(worker
>>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
>>> change detection mode mode=xsync
>>>
>>> [2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor]
>>> Monitor: worker(/rhgs/brick20/brick) connected
>>>
>>> [2020-06-25 07:07:17.273678] D [master(worker
>>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
>>> change detection mode mode=changelog
>>>
>>> [2020-06-25 07:07:17.274224] D [master(worker
>>> /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up
>>> change detection mode mode=changeloghistory
>>>
>>> [2020-06-25 07:07:17.276484] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068837.28 version() ...
>>>
>>> [2020-06-25 07:07:17.276916] D [repce(worker
>>> /rhgs/brick20/brick):215:__call__] RepceClient: call
>>> 6799:140380783982400:1593068837.28 version -> 1.0
>>>
>>> [2020-06-25 07:07:17.277009] D [master(worker
>>> /rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog
>>> working dir
>>>
>/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick
>>>
>>> [2020-06-25 07:07:17.277098] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068837.28 init() ...
>>>
>>> [2020-06-25 07:07:17.292944] D [repce(worker
>>> /rhgs/brick20/brick):215:__call__] RepceClient: call
>>> 6799:140380783982400:1593068837.28 init -> None
>>>
>>> [2020-06-25 07:07:17.293097] D [repce(worker
>>> /rhgs/brick20/brick):195:push] RepceClient: call
>>> 6799:140380783982400:1593068837.29
>>> register('/rhgs/brick20/brick',
>>>
>'/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick',
>>>
>'/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log',
>>> 8, 5) ...
>>>
>>> [2020-06-25 07:07:19.296294] E [repce(agent
>>> /rhgs/brick20/brick):121:worker] <top>: call failed:
>>>
>>> Traceback (most recent call last):
>>>
>>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
>>> 117, in worker
>>>
>>> res = getattr(self.obj, rmeth)(*in_data[2:])
>>>
>>> File
>>> "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py",
>>> line 40, in register
>>>
>>> return Changes.cl_register(cl_brick, cl_dir, cl_log,
>>> cl_level, retries)
>>>
>>> File
>>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>>> line 46, in cl_register
>>>
>>> cls.raise_changelog_err()
>>>
>>> File
>>> "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py",
>>> line 30, in raise_changelog_err
>>>
>>> raise ChangelogException(errn, os.strerror(errn))
>>>
>>> ChangelogException: [Errno 2] No such file or directory
>>>
>>> [2020-06-25 07:07:19.297161] E [repce(worker
>>> /rhgs/brick20/brick):213:__call__] RepceClient: call failed
>>> call=6799:140380783982400:1593068837.29 method=register
>>> error=ChangelogException
>>>
>>> [2020-06-25 07:07:19.297338] E [resource(worker
>>> /rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog
>>> register failed error=[Errno 2] No such file or directory
>>>
>>> [2020-06-25 07:07:19.315074] I [repce(agent
>>> /rhgs/brick20/brick):96:service_loop] RepceServer: terminating
>on
>>> reaching EOF.
>>>
>>> [2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor]
>>> Monitor: worker died in startup phase
>brick=/rhgs/brick20/brick
>>>
>>> [2020-06-25 07:07:20.277383] I
>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus:
>>> Worker Status Change status=Faulty
>>>
>>> We’ve done everything we can think of, including an “strace –f”
>>> on the pid, and we can’t really find anything. I’m about to lose
>>> the last of my hair over this, so does anyone have any ideas at
>>> all? We’ve even removed the entire slave vol and rebuilt it.
>>>
>>> Thanks
>>>
>>> Rob
>>>
>>> *Rob Quagliozzi*
>>>
>>> *Specialised Application Support*
>>>
>>>
>>>
>>>
>------------------------------------------------------------------------
>>> This email (including any attachments to it) is confidential,
>>> legally privileged, subject to copyright and is sent for the
>>> personal attention of the intended recipient only. If you have
>>> received this email in error, please advise us immediately and
>>> delete it. You are notified that disclosing, copying,
>>> distributing or taking any action in reliance on the contents of
>>> this information is strictly prohibited. Although we have taken
>>> reasonable precautions to ensure no viruses are present in this
>>> email, we cannot accept responsibility for any loss or damage
>>> arising from the viruses in this email or attachments. We
>exclude
>>> any liability for the content of this email, or for the
>>> consequences of any actions taken on the basis of the
>information
>>> provided in this email or its attachments, unless that
>>> information is subsequently confirmed in writing. <#rbnl#1898i>
>>>
>------------------------------------------------------------------------
>>>
>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge:https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list