[Gluster-users] Geo Replication stops replicating

deepu srinivasan sdeepugd at gmail.com
Tue Jun 4 11:40:28 UTC 2019


Hi
As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo
replication failed to start.
Stays in faulty state

On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Checked the data. It remains in 2708. No progress.
>
> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> That means it could be working and the defunct process might be some old
>> zombie one. Could you check, that data progress ?
>>
>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Hi
>>> When i change the rsync option the rsync process doesnt seem to start .
>>> Only a defunt process is listed in ps aux. Only when i set rsync option to
>>> " " and restart all the process the rsync process is listed in ps aux.
>>>
>>>
>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> Yes, rsync config option should have fixed this issue.
>>>>
>>>> Could you share the output of the following?
>>>>
>>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>> config rsync-options
>>>> 2. ps -ef | grep rsync
>>>>
>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com>
>>>> wrote:
>>>>
>>>>> Done.
>>>>> We got the following result .
>>>>>
>>>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>> failed: No such file or directory (2)", 128
>>>>>
>>>>> seems like a file is missing ?
>>>>>
>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>>>> khiremat at redhat.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Could you take the strace with with more string size? The argument
>>>>>> strings are truncated.
>>>>>>
>>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>>
>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Kotresh
>>>>>>> The above-mentioned work around did not work properly.
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Kotresh
>>>>>>>> We have tried the above-mentioned rsync option and we are planning
>>>>>>>> to have the version upgrade to 6.0.
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This looks like the hang because stderr buffer filled up with
>>>>>>>>> errors messages and no one reading it.
>>>>>>>>> I think this issue is fixed in latest releases. As a workaround,
>>>>>>>>> you can do following and check if it works.
>>>>>>>>>
>>>>>>>>> Prerequisite:
>>>>>>>>>  rsync version should be > 3.1.0
>>>>>>>>>
>>>>>>>>> Workaround:
>>>>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>>>>>> config rsync-options "--ignore-missing-args"
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kotresh HR
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <
>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is
>>>>>>>>>> in US west and one is in US east. We took multiple trials for different
>>>>>>>>>> file size.
>>>>>>>>>> The Geo Replication tends to stop replicating but while checking
>>>>>>>>>> the status it appears to be in Active state. But the slave volume did not
>>>>>>>>>> increase in size.
>>>>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>>>>>> error.
>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The
>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume.
>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the
>>>>>>>>>> replication did not happen in the slave end. What would be the cause of
>>>>>>>>>> this problem? Is there anyway to debug it?
>>>>>>>>>>
>>>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>>>> it displays something like this
>>>>>>>>>>
>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We are using the below specs
>>>>>>>>>>
>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>> Sync mode - rsync
>>>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks and Regards,
>>>>>>>>> Kotresh H R
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Kotresh H R
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/f42094da/attachment-0001.html>


More information about the Gluster-users mailing list