[Gluster-users] geo-replication errors

Fri Mar 14 11:59:16 UTC 2014

I have also tried enabling geo-replication using only a directory on the
slave server rather than a gluster volume and it fails in
the same way.

I've noticed that every time it fails the following is logged on the master

[2014-03-14 11:51:43.155292] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel
7.13
[2014-03-14 11:51:43.155959] I
[afr-common.c:2022:afr_set_root_inode_on_first_lookup]
0-volname-replicate-0: added root inode
[2014-03-14 11:51:53.065063] I [fuse-bridge.c:4091:fuse_thread_proc]
0-fuse: unmounting /tmp/gsyncd-aux-mount-mlNTEe
[2014-03-14 11:51:53.065631] W [glusterfsd.c:838:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x30aeee894d]
(-->/lib64/libpthread.so.0() [0x30af207851]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405d8d]))) 0-:
received signum (15), shutting down
[2014-03-14 11:51:53.065683] I [fuse-bridge.c:4655:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-mlNTEe'.

On Thu, Mar 13, 2014 at 10:33 AM, John Ewing <johnewing1 at gmail.com> wrote:

> Hi,
>
> Thanks for the advice, I finally have time to go back to this issue now.
>
> It doesn't seem to be sticking on any particular part of the file system
> as far as I can tell.
>
> One thing I've noticed is I always get an error about missing 'option
> transport-type'
>
>
> 2014-03-13 09:57:00.902189] E [resource:194:logerr] Popen: ssh>
> [2014-03-13 09:56:50.093951] W [rpc-transport.c:174:rpc_transport_load]
> 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
>
> on the master I have the following in glusterd.vol
>
> volume management
>     type mgmt/glusterd
>     option working-directory /var/lib/glusterd
>     option transport-type socket,rdma
>     option transport.socket.keepalive-time 10
>     option transport.socket.keepalive-interval 2
>     option transport.socket.read-fail-log off
> end-volume
>
>
> on the slave I have
>
> volume management
>     type mgmt/glusterd
>     option working-directory /var/lib/glusterd
>     option transport-type socket,rdma
>     option transport.socket.keepalive-time 10
>     option transport.socket.keepalive-interval 2
>     option transport.socket.read-fail-log off
>
>     option mountbroker-root /var/mountbroker-root
>     option mountbroker-geo-replication.gluster-async
> geo-ftb-vol,geo-bak-vol,geo-j1h-vol
>     option geo-replication-log-group gluster-async
>
> end-volume
>
>
> What should I change to fix this error?
>
>
>
> master log
>
> [2014-03-13 09:56:47.888899] I [monitor(monitor):80:monitor] Monitor:
> ------------------------------------------------------------
> [2014-03-13 09:56:47.889317] I [monitor(monitor):81:monitor] Monitor:
> starting gsyncd worker
> [2014-03-13 09:56:47.995637] I [gsyncd:354:main_i] <top>: syncing:
> gluster://localhost:volname -> ssh://gluster-async@xx.xx.xx.xx
> :gluster://localhost:geo-ftb-vol
> [2014-03-13 09:56:48.22799] D [repce:175:push] RepceClient: call
> 14516:140653524453120:1394704608.02 __repce_version__() ...
> [2014-03-13 09:57:00.898520] E [syncdutils:173:log_raise_exception] <top>:
> connection to peer is broken
> [2014-03-13 09:57:00.901844] E [resource:191:errlog] Popen: command "ssh
> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S
> /tmp/gsyncd-aux-ssh-_wRYS3/gsycnd-ssh-%r@%h:%p gluster-async at xx.xx.xx.xx/nonexistent/gsyncd --session-owner acfda6fc-d995-4bf0-b13e-da789afb28c7 -N
> --listen --timeout 120 gluster://localhost:geo-ftb-vol" returned with 1,
> saying:
> [2014-03-13 09:57:00.902189] E [resource:194:logerr] Popen: ssh>
> [2014-03-13 09:56:50.093951] W [rpc-transport.c:174:rpc_transport_load]
> 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
> [2014-03-13 09:57:00.902648] E [resource:194:logerr] Popen: ssh>
> [2014-03-13 09:56:52.136564] I [cli-rpc-ops.c:4318:gf_cli3_1_getwd_cbk]
> 0-cli: Received resp to getwd
> [2014-03-13 09:57:00.902940] E [resource:194:logerr] Popen: ssh>
> [2014-03-13 09:56:52.136782] I [input.c:46:cli_batch] 0-: Exiting with: 0
> [2014-03-13 09:57:00.903209] E [resource:194:logerr] Popen: ssh> failed
> with error.
> [2014-03-13 09:57:00.903844] I [syncdutils:142:finalize] <top>: exiting.
> [2014-03-13 09:57:00.906152] D [monitor(monitor):96:monitor] Monitor:
> worker seems to be connected (?? racy check)
> [2014-03-13 09:57:01.907625] D [monitor(monitor):100:monitor] Monitor:
> worker died in startup phase
> [2014-03-13 09:57:11.918355] I [monitor(monitor):80:monitor] Monitor:
> ------------------------------------------------------------
> [2014-03-13 09:57:11.918920] I [monitor(monitor):81:monitor] Monitor:
> starting gsyncd worker
> [2014-03-13 09:57:12.29169] I [gsyncd:354:main_i] <top>: syncing:
> gluster://localhost:volname -> ssh://gluster-async@xx.xx.xx.xx
> :gluster://localhost:geo-ftb-vol
>
>
> -- lots of entries about syncing files ---
>
> [2014-03-13 10:10:20.670299] E [syncdutils:190:log_raise_exception] <top>:
> FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 216,
> in twrap
>     tf(*aa)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 123,
> in tailer
>     poe, _ ,_ = select([po.stderr for po in errstore], [], [], 1)
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 276,
> in select
>     return eintr_wrap(oselect.select, oselect.error, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 269,
> in eintr_wrap
>     return func(*a)
> error: (9, 'Bad file descriptor')
> [2014-03-13 10:10:20.671988] I [syncdutils:142:finalize] <top>: exiting.
> [2014-03-13 10:10:21.624923] D [monitor(monitor):100:monitor] Monitor:
> worker died in startup phase
>
>
> slave log
>
> [2014-03-13 10:08:44.478434] I [gsyncd(slave):354:main_i] <top>: syncing:
> gluster://localhost:geo-ftb-vol
> [2014-03-13 10:08:55.6546] I [resource(slave):453:service_loop] GLUSTER:
> slave listening
> [2014-03-13 10:09:31.698591] I [repce(slave):78:service_loop] RepceServer:
> terminating on reaching EOF.
> [2014-03-13 10:09:31.699101] I [syncdutils(slave):142:finalize] <top>:
> exiting.
> [2014-03-13 10:09:49.26217] I [gsyncd(slave):354:main_i] <top>: syncing:
> gluster://localhost:geo-ftb-vol
> [2014-03-13 10:10:00.252576] I [resource(slave):453:service_loop] GLUSTER:
> slave listening
> [2014-03-13 10:10:20.783905] I [repce(slave):78:service_loop] RepceServer:
> terminating on reaching EOF.
> [2014-03-13 10:10:20.784468] I [syncdutils(slave):142:finalize] <top>:
> exiting.
> [2014-03-13 10:10:37.405524] I [gsyncd(slave):354:main_i] <top>: syncing:
> gluster://localhost:geo-ftb-vol
> [2014-03-13 10:10:46.988630] I [resource(slave):453:service_loop] GLUSTER:
> slave listening
>
> Thanks
>
> J.
>
>
>
>
>
> On Fri, Feb 14, 2014 at 1:51 PM, Venky Shankar <yknev.shankar at gmail.com>wrote:
>
>> Could you try again after changing the log-level to DEBUG using:
>>
>> # gluster volume geo-replication <master> <slave> config log-level DEBUG
>>
>> Also, logs from both master and slave would help.
>>
>> Thanks,
>> -venky
>>
>>
>> On Wed, Feb 12, 2014 at 4:44 PM, John Ewing <johnewing1 at gmail.com> wrote:
>>
>>> No, its the latest 3.3 series release.
>>>
>>> 3.3.2 on both master and slave.
>>> Centos 6 on master , Amazon linux on slave.
>>> rsync 3.0.6 on both
>>>
>>> Using unprivileged ssh user setup with mountbroker.
>>>
>>> One thing I noticed was that the 3.3 manual says the base requirement is
>>> for rsync 3.0.0 and higher and the webpage now
>>> says 3.0.7. Is this relevant ?
>>>
>>>
>>> On Wed, Feb 12, 2014 at 2:12 AM, Venky Shankar <yknev.shankar at gmail.com>wrote:
>>>
>>>> Is this from the latest master branch?
>>>>
>>>>
>>>> On Tue, Feb 11, 2014 at 4:35 PM, John Ewing <johnewing1 at gmail.com>wrote:
>>>>
>>>>> I am trying to use geo-replication but it is running slowly and I keep
>>>>> getting the
>>>>> following logged in the geo-replication log.
>>>>>
>>>>> [2014-02-11 10:56:42.831517] I [monitor(monitor):80:monitor] Monitor:
>>>>> ------------------------------------------------------------
>>>>> [2014-02-11 10:56:42.832226] I [monitor(monitor):81:monitor] Monitor:
>>>>> starting gsyncd worker
>>>>> [2014-02-11 10:56:42.951199] I [gsyncd:354:main_i] <top>: syncing:
>>>>> gluster://localhost:xxxxxxx -> ssh://gluster-async@xx.xx.xx.xx
>>>>> :gluster://localhost:xxxxx
>>>>> [2014-02-11 10:56:53.79632] I [master:284:crawl] GMaster: new master
>>>>> is acfda6fc-d995-4bf0-b13e-da789afb28c7
>>>>> [2014-02-11 10:56:53.80282] I [master:288:crawl] GMaster: primary
>>>>> master with volume id acfda6fc-d995-4bf0-b13e-da789afb28c7 ...
>>>>> [2014-02-11 10:56:57.453376] E [syncdutils:190:log_raise_exception]
>>>>> <top>: FAIL:
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>>>>> 216, in twrap
>>>>>     tf(*aa)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>>>>> 123, in tailer
>>>>>     poe, _ ,_ = select([po.stderr for po in errstore], [], [], 1)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>>>>> 276, in select
>>>>>     return eintr_wrap(oselect.select, oselect.error, *a)
>>>>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>>>>> 269, in eintr_wrap
>>>>>     return func(*a)
>>>>> error: (9, 'Bad file descriptor')
>>>>> [2014-02-11 10:56:57.462110] I [syncdutils:142:finalize] <top>:
>>>>> exiting.
>>>>>
>>>>> I'm unsure what to do to debug and fix this.
>>>>>
>>>>> Thanks
>>>>>
>>>>> John.
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140314/cab172f6/attachment.html>