[Gluster-users] Geo Replication stops replicating

deepu srinivasan sdeepugd at gmail.com
Tue Jun 4 11:54:03 UTC 2019


Hi Kortesh
Please find the logs of the above error
*Master log snippet*

> [2019-06-04 11:52:09.254731] I [resource(worker
> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing
> SSH connection between master and slave...
>  [2019-06-04 11:52:09.308923] D [repce(worker
> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
> 89724:139652759443264:1559649129.31 __repce_version__() ...
>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>:
> connection to peer is broken
>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error
>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@
> 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id
> 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick
> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node-
>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
> --slave-log-level DEBUG --slave-gluster-log-level INFO
> --slave-gluster-command-dir /usr/sbin   error=1
>  [2019-06-04 11:52:09.614996] I [repce(agent
> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating
> on reaching EOF.
>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor:
> worker(/home/sas/gluster/data/code-misc) connected
>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor:
> worker died in startup phase brick=/home/sas/gluster/data/code-misc
>  [2019-06-04 11:52:09.619391] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>

*Slave log snippet*

> [2019-06-04 11:50:09.782668] E [syncdutils(slave
> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen:
> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)
> [2019-06-04 11:50:11.188167] W [gsyncd(slave
> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>: Session
> config file not exists, using the default config
> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
> [2019-06-04 11:50:11.201070] I [resource(slave
> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER:
> Mounting gluster volume locally...
> [2019-06-04 11:50:11.271231] E [resource(slave
> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
> MountbrokerMounter: glusterd answered mnt=
> [2019-06-04 11:50:11.271998] E [syncdutils(slave
> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen:
> command returned error cmd=/usr/sbin/gluster --remote-host=localhost
> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO
> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
> [2019-06-04 11:50:11.272113] E [syncdutils(slave
> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen:
> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory)


On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi
> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo
> replication failed to start.
> Stays in faulty state
>
> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> wrote:
>
>> Checked the data. It remains in 2708. No progress.
>>
>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com> wrote:
>>
>>> That means it could be working and the defunct process might be some old
>>> zombie one. Could you check, that data progress ?
>>>
>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>> When i change the rsync option the rsync process doesnt seem to start .
>>>> Only a defunt process is listed in ps aux. Only when i set rsync option to
>>>> " " and restart all the process the rsync process is listed in ps aux.
>>>>
>>>>
>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
>>>> khiremat at redhat.com> wrote:
>>>>
>>>>> Yes, rsync config option should have fixed this issue.
>>>>>
>>>>> Could you share the output of the following?
>>>>>
>>>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>> config rsync-options
>>>>> 2. ps -ef | grep rsync
>>>>>
>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Done.
>>>>>> We got the following result .
>>>>>>
>>>>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>> failed: No such file or directory (2)", 128
>>>>>>
>>>>>> seems like a file is missing ?
>>>>>>
>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>>>>> khiremat at redhat.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could you take the strace with with more string size? The argument
>>>>>>> strings are truncated.
>>>>>>>
>>>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Kotresh
>>>>>>>> The above-mentioned work around did not work properly.
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <
>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Kotresh
>>>>>>>>> We have tried the above-mentioned rsync option and we are planning
>>>>>>>>> to have the version upgrade to 6.0.
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> This looks like the hang because stderr buffer filled up with
>>>>>>>>>> errors messages and no one reading it.
>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround,
>>>>>>>>>> you can do following and check if it works.
>>>>>>>>>>
>>>>>>>>>> Prerequisite:
>>>>>>>>>>  rsync version should be > 3.1.0
>>>>>>>>>>
>>>>>>>>>> Workaround:
>>>>>>>>>> gluster volume geo-replication <MASTERVOL>
>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing-
>>>>>>>>>> args"
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Kotresh HR
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <
>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one
>>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different
>>>>>>>>>>> file size.
>>>>>>>>>>> The Geo Replication tends to stop replicating but while checking
>>>>>>>>>>> the status it appears to be in Active state. But the slave volume did not
>>>>>>>>>>> increase in size.
>>>>>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>>>>>>> error.
>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The
>>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume.
>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the
>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of
>>>>>>>>>>> this problem? Is there anyway to debug it?
>>>>>>>>>>>
>>>>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>>>>> it displays something like this
>>>>>>>>>>>
>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We are using the below specs
>>>>>>>>>>>
>>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Kotresh H R
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks and Regards,
>>>>>>> Kotresh H R
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Kotresh H R
>>>>>
>>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/e216aea5/attachment-0001.html>


More information about the Gluster-users mailing list