[Gluster-users] Geo-Replication Issue while upgrading

Thu Nov 28 12:09:42 UTC 2019

Hi Deepu,

Looks like this is error generated due to ssh restrictions:
Can you please check and confirm ssh is properly configured?

2019-11-28 11:59:12.934436] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************

[2019-11-28 11:59:12.934703] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING:
This system is a restricted access system.  All activity on this
system is subject to monitoring.  If information collected reveals
possible criminal activity or activity that exceeds privileges,
evidence of such activity may be providedto the relevant authorities
for further action.

[2019-11-28 11:59:12.934967] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By
continuing past this point, you expressly consent to   this
monitoring.- ZOHO Corporation

[2019-11-28 11:59:12.935194] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************

2019-11-28 11:59:12.944369] I [repce(agent
/home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating on reaching EOF.

/sunny

On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <sdeepugd at gmail.com> wrote:
>
>
>
> ---------- Forwarded message ---------
> From: deepu srinivasan <sdeepugd at gmail.com>
> Date: Thu, Nov 28, 2019 at 5:32 PM
> Subject: Geo-Replication Issue while upgrading
> To: gluster-users <gluster-users at gluster.org>
>
>
> Hi Users/Developers
> I hope you remember the last issue we faced regarding the geo-replication goes to the faulty state while stopping and starting the geo-replication.
>>
>> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status Change       status=Active
>> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change   status=History Crawl
>> [2019-11-16 17:29:43.630328] I [master(worker /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)     etime=1573925383
>> [2019-11-16 17:29:44.636725] I [master(worker /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time     stime=(1573924576, 0)
>> [2019-11-16 17:29:44.778966] I [master(worker /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore, take out entry       retry_count=1   entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
>> [2019-11-16 17:29:44.779306] I [master(worker /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch    retry_count=1
>> [2019-11-16 17:29:44.779516] I [master(worker /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original entries. count = 1
>> [2019-11-16 17:29:44.879321] E [repce(worker /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed  call=151945:140353273153344:1573925384.78       method=entry_ops        error=OSError
>> [2019-11-16 17:29:44.879750] E [syncdutils(worker /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in main
>>     func(args)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in subcmd_worker
>>     local.service_loop(remote)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in service_loop
>>     g3.crawlwrap(oneshot=True)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in crawlwrap
>>     self.crawl()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl
>>     self.changelogs_batch_process(changes)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in changelogs_batch_process
>>     self.process(batch)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in process
>>     self.process_change(change, done, retry)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in process_change
>>     failures = self.slave.server.entry_ops(entries)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__
>>     raise res
>> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
>> [2019-11-16 17:29:44.911767] I [repce(agent /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF.
>> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase     brick=/home/sas/gluster/data/code-misc6
>> [2019-11-16 17:29:45.511806] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>
>
>
>
> Now after upgrading to 7.0 version from 5.6 we got an error in geo-replication.
> Scenario:
>
> We had a 1x3 replication and distributed volume in each DC.
> Both volumes are started and the geo-replication session is set up between them and the files are synched. Now the geo-replication session is deleted.
> Started to upgrade to 7.0 for each server starting from the slave end. I followed this link --> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
> After starting the glusterd process created a geo-replication again but ends up in a faulty state. Please find the logs
>
>> [2019-11-28 11:59:12.370255] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing...
>>
>> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/home/sas/gluster/data/code-misc slave_node=192.168.185.84
>>
>> [2019-11-28 11:59:12.445581] I [gsyncd(agent /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>
>> [2019-11-28 11:59:12.448383] I [changelogagent(agent /home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: Agent listining...
>>
>> [2019-11-28 11:59:12.453881] I [gsyncd(worker /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>
>> [2019-11-28 11:59:12.472862] I [resource(worker /home/sas/gluster/data/code-misc):1386:connect_remote] SSH: Initializing SSH connection between master and slave...
>>
>> [2019-11-28 11:59:12.933346] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: connection to peer is broken
>>
>> [2019-11-28 11:59:12.934117] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock sas at 192.168.185.84 /usr/libexec/glusterfs/gsyncd slave code-misc sas at 192.168.185.118::code-misc --master-node 192.168.185.89 --master-node-id a7a9688e-700c-4452-9cd6-e10d6eed5335 --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.84 --local-node-id cbafeca3-650b-4c9e-8ea6-2451ea9265dd --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 3 error=1
>>
>> [2019-11-28 11:59:12.934436] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> **************************************************************************************************************************
>>
>> [2019-11-28 11:59:12.934703] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This system is a restricted access system.  All activity on this system is subject to monitoring.  If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action.
>>
>> [2019-11-28 11:59:12.934967] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past this point, you expressly consent to   this monitoring.- ZOHO Corporation
>>
>> [2019-11-28 11:59:12.935194] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> **************************************************************************************************************************
>>
>> [2019-11-28 11:59:12.944369] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF.
>>
>> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>
>> [2019-11-28 11:59:12.947575] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty
>
>