[Gluster-users] Geo Replication stops replicating

Thu Jun 6 04:58:43 UTC 2019

Hi,

I think the steps to setup non-root geo-rep is not followed properly. The
following entry is missing in glusterd vol file which is required.

The message "E [MSGID: 106061]
[glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
mountbroker-root' missing in glusterd vol file" repeated 33 times between
[2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757]

Could you please the steps from below?

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave

And let us know if you still face the issue.

On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi Kotresh, Sunny
> I Have mailed the logs I found in one of the slave machines. Is there
> anything to do with permission? Please help.
>
> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Hi Kotresh, Sunny
>> Found this log in the slave machine.
>>
>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488]
>>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>>> Received get vol req
>>>
>>> The message "I [MSGID: 106488]
>>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>>> Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583]
>>> and [2019-06-05 08:49:10.670863]
>>>
>>> The message "I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and
>>> [2019-06-05 08:50:37.254063]
>>>
>>> The message "E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file" repeated 34 times between
>>> [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079]
>>>
>>> The message "W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]" repeated 34 times between
>>> [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080]
>>>
>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req
>>>
>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file
>>>
>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]
>>>
>>> The message "I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and
>>> [2019-06-05 08:52:34.019741]
>>>
>>> The message "E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file" repeated 33 times between
>>> [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757]
>>>
>>> The message "W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]" repeated 33 times between
>>> [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758]
>>>
>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496]
>>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received
>>> mount req
>>>
>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061]
>>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option
>>> mountbroker-root' missing in glusterd vol file
>>>
>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176]
>>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful
>>> mount request [No such file or directory]
>>>
>>
>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Thankyou Kotresh
>>>
>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> Ccing Sunny, who was investing similar issue.
>>>>
>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan <sdeepugd at gmail.com>
>>>> wrote:
>>>>
>>>>> Have already added the path in bashrc . Still in faulty state
>>>>>
>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar <
>>>>> khiremat at redhat.com> wrote:
>>>>>
>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If
>>>>>> it's bash, add 'export PATH=/usr/sbin:$PATH' in
>>>>>> /home/sas/.bashrc
>>>>>>
>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Kortesh
>>>>>>> Please find the logs of the above error
>>>>>>> *Master log snippet*
>>>>>>>
>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker
>>>>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing
>>>>>>>> SSH connection between master and slave...
>>>>>>>>  [2019-06-04 11:52:09.308923] D [repce(worker
>>>>>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call
>>>>>>>> 89724:139652759443264:1559649129.31 __repce_version__() ...
>>>>>>>>  [2019-06-04 11:52:09.602792] E [syncdutils(worker
>>>>>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>:
>>>>>>>> connection to peer is broken
>>>>>>>>  [2019-06-04 11:52:09.603312] E [syncdutils(worker
>>>>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error
>>>>>>>>   cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
>>>>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
>>>>>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock
>>>>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc
>>>>>>>> sas@   192.168.185.107::code-misc --master-node 192.168.185.106
>>>>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick
>>>>>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node-
>>>>>>>>   id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120
>>>>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO
>>>>>>>> --slave-gluster-command-dir /usr/sbin   error=1
>>>>>>>>  [2019-06-04 11:52:09.614996] I [repce(agent
>>>>>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating
>>>>>>>> on reaching EOF.
>>>>>>>>  [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor]
>>>>>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected
>>>>>>>>  [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor]
>>>>>>>> Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>>>>>>>  [2019-06-04 11:52:09.619391] I
>>>>>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
>>>>>>>> Change status=Faulty
>>>>>>>>
>>>>>>>
>>>>>>> *Slave log snippet*
>>>>>>>
>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave
>>>>>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr]
>>>>>>>> Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or
>>>>>>>> directory)
>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave
>>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>:
>>>>>>>> Session config file not exists, using the default config
>>>>>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf
>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave
>>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect]
>>>>>>>> GLUSTER: Mounting gluster volume locally...
>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave
>>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter]
>>>>>>>> MountbrokerMounter: glusterd answered mnt=
>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave
>>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog]
>>>>>>>> Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost
>>>>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO
>>>>>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log
>>>>>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1
>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave
>>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr]
>>>>>>>> Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or
>>>>>>>> directory)
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But
>>>>>>>> the Geo replication failed to start.
>>>>>>>> Stays in faulty state
>>>>>>>>
>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Checked the data. It remains in 2708. No progress.
>>>>>>>>>
>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> That means it could be working and the defunct process might be
>>>>>>>>>> some old zombie one. Could you check, that data progress ?
>>>>>>>>>>
>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <
>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to
>>>>>>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync
>>>>>>>>>>> option to " " and restart all the process the rsync process is listed in ps
>>>>>>>>>>> aux.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
>>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes, rsync config option should have fixed this issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Could you share the output of the following?
>>>>>>>>>>>>
>>>>>>>>>>>> 1. gluster volume geo-replication <MASTERVOL>
>>>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options
>>>>>>>>>>>> 2. ps -ef | grep rsync
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <
>>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Done.
>>>>>>>>>>>>> We got the following result .
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>>>>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>>>>>>>>>>> failed: No such file or directory (2)", 128
>>>>>>>>>>>>>
>>>>>>>>>>>>> seems like a file is missing ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>>>>>>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you take the strace with with more string size? The
>>>>>>>>>>>>>> argument strings are truncated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <
>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>>>>> The above-mentioned work around did not work properly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <
>>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Kotresh
>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are
>>>>>>>>>>>>>>>> planning to have the version upgrade to 6.0.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath
>>>>>>>>>>>>>>>> Ravishankar <khiremat at redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up
>>>>>>>>>>>>>>>>> with errors messages and no one reading it.
>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a
>>>>>>>>>>>>>>>>> workaround, you can do following and check if it works.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Prerequisite:
>>>>>>>>>>>>>>>>>  rsync version should be > 3.1.0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Workaround:
>>>>>>>>>>>>>>>>> gluster volume geo-replication <MASTERVOL>
>>>>>>>>>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-
>>>>>>>>>>>>>>>>> missing-args"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Kotresh HR
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <
>>>>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two
>>>>>>>>>>>>>>>>>> DCs one is in US west and one is in US east. We took multiple trials for
>>>>>>>>>>>>>>>>>> different file size.
>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while
>>>>>>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume
>>>>>>>>>>>>>>>>>> did not increase in size.
>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and
>>>>>>>>>>>>>>>>>> checked the status. The status was in an active state and it was in History
>>>>>>>>>>>>>>>>>> Crawl for a long time. We have enabled the DEBUG mode in logging and
>>>>>>>>>>>>>>>>>> checked for any error.
>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing
>>>>>>>>>>>>>>>>>> candidate. The Rsync process starts but the rsync did not happen in the
>>>>>>>>>>>>>>>>>> slave volume. Every time the rsync process appears in the "ps auxxx" list
>>>>>>>>>>>>>>>>>> but the replication did not happen in the slave end. What would be the
>>>>>>>>>>>>>>>>>> cause of this problem? Is there anyway to debug it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>>>>>>>>>>>> it displays something like this
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We are using the below specs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>>>>>>>>>>> Sync mode - rsync
>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>> Kotresh H R
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Kotresh H R
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Kotresh H R
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190606/5543c358/attachment-0001.html>