[Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Alexander Iliev
ailiev+gluster at mamul.org
Thu Oct 10 22:31:58 UTC 2019
Hi all,
I ended up reinstalling the nodes with CentOS 7.5 and GlusterFS 6.5
(installed from the SIG.)
Now when I try to create a replication session I get the following:
> # gluster volume geo-replication store1 <slave-host>::store2 create
push-pem
> Unable to mount and fetch slave volume details. Please check the log:
/var/log/glusterfs/geo-replication/gverify-slavemnt.log
> geo-replication command failed
You can find the contents of gverify-slavemnt.log below, but the initial
error seems to be:
> [2019-10-10 22:07:51.578519] E [fuse-bridge.c:5211:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
I only found [this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824)
bug report which doesn't seem to help. The reported issue is failure to
mount a volume on a GlusterFS client, but in my case I need
geo-replication which implies the client (geo-replication master) being
on a different network.
Any help will be appreciated.
Thanks!
gverify-slavemnt.log:
> [2019-10-10 22:07:40.571256] I [MSGID: 100030]
[glusterfsd.c:2847:main] 0-glusterfs: Started running glusterfs version
6.5 (args: glusterfs --xlator-option=*dht.lookup-unhashed=off
--volfile-server <slave-host> --volfile-id store2 -l
/var/log/glusterfs/geo-replication/gverify-slavemnt.log
/tmp/gverify.sh.5nFlRh)
> [2019-10-10 22:07:40.575438] I [glusterfsd.c:2556:daemonize]
0-glusterfs: Pid of current running process is 6021
> [2019-10-10 22:07:40.584282] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 0
> [2019-10-10 22:07:40.584299] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
> [2019-10-10 22:07:40.928094] I [MSGID: 114020] [client.c:2393:notify]
0-store2-client-0: parent translators are ready, attempting connect on
transport
> [2019-10-10 22:07:40.931121] I [MSGID: 114020] [client.c:2393:notify]
0-store2-client-1: parent translators are ready, attempting connect on
transport
> [2019-10-10 22:07:40.933976] I [MSGID: 114020] [client.c:2393:notify]
0-store2-client-2: parent translators are ready, attempting connect on
transport
> Final graph:
>
+------------------------------------------------------------------------------+
> 1: volume store2-client-0
> 2: type protocol/client
> 3: option ping-timeout 42
> 4: option remote-host 172.31.36.11
> 5: option remote-subvolume /data/gfs/store1/1/brick-store2
> 6: option transport-type socket
> 7: option transport.address-family inet
> 8: option transport.socket.ssl-enabled off
> 9: option transport.tcp-user-timeout 0
> 10: option transport.socket.keepalive-time 20
> 11: option transport.socket.keepalive-interval 2
> 12: option transport.socket.keepalive-count 9
> 13: option send-gids true
> 14: end-volume
> 15:
> 16: volume store2-client-1
> 17: type protocol/client
> 18: option ping-timeout 42
> 19: option remote-host 172.31.36.12
> 20: option remote-subvolume /data/gfs/store1/1/brick-store2
> 21: option transport-type socket
> 22: option transport.address-family inet
> 23: option transport.socket.ssl-enabled off
> 24: option transport.tcp-user-timeout 0
> 25: option transport.socket.keepalive-time 20
> 26: option transport.socket.keepalive-interval 2
> 27: option transport.socket.keepalive-count 9
> 28: option send-gids true
> 29: end-volume
> 30:
> 31: volume store2-client-2
> 32: type protocol/client
> 33: option ping-timeout 42
> 34: option remote-host 172.31.36.13
> 35: option remote-subvolume /data/gfs/store1/1/brick-store2
> 36: option transport-type socket
> 37: option transport.address-family inet
> 38: option transport.socket.ssl-enabled off
> 39: option transport.tcp-user-timeout 0
> 40: option transport.socket.keepalive-time 20
> 41: option transport.socket.keepalive-interval 2
> 42: option transport.socket.keepalive-count 9
> 43: option send-gids true
> 44: end-volume
> 45:
> 46: volume store2-replicate-0
> 47: type cluster/replicate
> 48: option afr-pending-xattr
store2-client-0,store2-client-1,store2-client-2
> 49: option use-compound-fops off
> 50: subvolumes store2-client-0 store2-client-1 store2-client-2
> 51: end-volume
> 52:
> 53: volume store2-dht
> 54: type cluster/distribute
> 55: option lookup-unhashed off
> 56: option lock-migration off
> 57: option force-migration off
> 58: subvolumes store2-replicate-0
> 59: end-volume
> 60:
> 61: volume store2-write-behind
> 62: type performance/write-behind
> 63: subvolumes store2-dht
> 64: end-volume
> 65:
> 66: volume store2-read-ahead
> 67: type performance/read-ahead
> 68: subvolumes store2-write-behind
> 69: end-volume
> 70:
> 71: volume store2-readdir-ahead
> 72: type performance/readdir-ahead
> 73: option parallel-readdir off
> 74: option rda-request-size 131072
> 75: option rda-cache-limit 10MB
> 76: subvolumes store2-read-ahead
> 77: end-volume
> 78:
> 79: volume store2-io-cache
> 80: type performance/io-cache
> 81: subvolumes store2-readdir-ahead
> 82: end-volume
> 83:
> 84: volume store2-open-behind
> 85: type performance/open-behind
> 86: subvolumes store2-io-cache
> 87: end-volume
> 88:
> 89: volume store2-quick-read
> 90: type performance/quick-read
> 91: subvolumes store2-open-behind
> 92: end-volume
> 93:
> 94: volume store2-md-cache
> 95: type performance/md-cache
> 96: subvolumes store2-quick-read
> 97: end-volume
> 98:
> 99: volume store2
> 100: type debug/io-stats
> 101: option log-level INFO
> 102: option latency-measurement off
> 103: option count-fop-hits off
> 104: subvolumes store2-md-cache
> 105: end-volume
> 106:
> 107: volume meta-autoload
> 108: type meta
> 109: subvolumes store2
> 110: end-volume
> 111:
>
+------------------------------------------------------------------------------+
> [2019-10-10 22:07:51.578287] I [fuse-bridge.c:5142:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
kernel 7.22
> [2019-10-10 22:07:51.578356] I [fuse-bridge.c:5753:fuse_graph_sync]
0-fuse: switched to graph 0
> [2019-10-10 22:07:51.578467] I [MSGID: 108006]
[afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up
> [2019-10-10 22:07:51.578519] E [fuse-bridge.c:5211:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-10-10 22:07:51.578709] W [fuse-bridge.c:1266:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
> [2019-10-10 22:07:51.578687] I [MSGID: 108006]
[afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no subvolumes up
> [2019-10-10 22:09:48.222459] E [MSGID: 108006]
[afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0:
All subvolumes are down. Going offline until at least one of them comes
back up.
> The message "E [MSGID: 108006]
[afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0:
All subvolumes are down. Going offline until at least one of them comes
back up." repeated 2 times between [2019-10-10 22:09:48.222459] and
[2019-10-10 22:09:48.222891]
>
alexander iliev
On 9/8/19 4:50 PM, Alexander Iliev wrote:
> Hi all,
>
> Sunny, thank you for the update.
>
> I have applied the patch locally on my slave system and now the
> mountbroker setup is successful.
>
> I am facing another issue though - when I try to create a replication
> session between the two sites I am getting:
>
> # gluster volume geo-replication store1
> glustergeorep@<slave-host>::store1 create push-pem
> Error : Request timed out
> geo-replication command failed
>
> It is still unclear to me if my setup is expected to work at all.
>
> Reading the geo-replication documentation at [1] I see this paragraph:
>
> > A password-less SSH connection is also required for gsyncd between
> every node in the master to every node in the slave. The gluster
> system:: execute gsec_create command creates secret-pem files on all the
> nodes in the master, and is used to implement the password-less SSH
> connection. The push-pem option in the geo-replication create command
> pushes these keys to all the nodes in the slave.
>
> It is not clear to me whether connectivity from each master node to each
> slave node is a requirement in terms of networking. In my setup the
> slave nodes form the Gluster pool over a private network which is not
> reachable from the master site.
>
> Any ideas how to proceed from here will be greatly appreciated.
>
> Thanks!
>
> Links:
> [1]
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication
>
>
> Best regards,
> --
> alexander iliev
>
> On 9/3/19 2:50 PM, Sunny Kumar wrote:
>> Thank you for the explanation Kaleb.
>>
>> Alexander,
>>
>> This fix will be available with next release for all supported versions.
>>
>> /sunny
>>
>> On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley <kkeithle at redhat.com>
>> wrote:
>>>
>>> Fixes on master (before or after the release-7 branch was taken)
>>> almost certainly warrant a backport IMO to at least release-6, and
>>> probably release-5 as well.
>>>
>>> We used to have a "tracker" BZ for each minor release (e.g. 6.6) to
>>> keep track of backports by cloning the original BZ and changing the
>>> Version, and adding that BZ to the tracker. I'm not sure what
>>> happened to that practice. The last ones I can find are for 6.3 and
>>> 5.7; https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.3 and
>>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.7
>>>
>>> It isn't enough to just backport recent fixes on master to release-7.
>>> We are supposedly continuing to maintain release-6 and release-5
>>> after release-7 GAs. If that has changed, I haven't seen an
>>> announcement to that effect. I don't know why our developers don't
>>> automatically backport to all the actively maintained releases.
>>>
>>> Even if there isn't a tracker BZ, you can always create a backport BZ
>>> by cloning the original BZ and change the release to 6. That'd be a
>>> good place to start.
>>>
>>> On Sun, Sep 1, 2019 at 8:45 AM Alexander Iliev
>>> <ailiev+gluster at mamul.org> wrote:
>>>>
>>>> Hi Strahil,
>>>>
>>>> Yes, this might be right, but I would still expect fixes like this
>>>> to be
>>>> released for all supported major versions (which should include 6.) At
>>>> least that's how I understand
>>>> https://www.gluster.org/release-schedule/.
>>>>
>>>> Anyway, let's wait for Sunny to clarify.
>>>>
>>>> Best regards,
>>>> alexander iliev
>>>>
>>>> On 9/1/19 2:07 PM, Strahil Nikolov wrote:
>>>>> Hi Alex,
>>>>>
>>>>> I'm not very deep into bugzilla stuff, but for me NEXTRELEASE means
>>>>> v7.
>>>>>
>>>>> Sunny,
>>>>> Am I understanding it correctly ?
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>> В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3, Alexander Iliev
>>>>> <ailiev+gluster at mamul.org> написа:
>>>>>
>>>>>
>>>>> Hi Sunny,
>>>>>
>>>>> Thank you for the quick response.
>>>>>
>>>>> It's not clear to me however if the fix has been already released
>>>>> or not.
>>>>>
>>>>> The bug status is CLOSED NEXTRELEASE and according to [1] the
>>>>> NEXTRELEASE resolution means that the fix will be included in the next
>>>>> supported release. The bug is logged against the mainline version
>>>>> though, so I'm not sure what this means exactly.
>>>>>
>>>>> From the 6.4[2] and 6.5[3] release notes it seems it hasn't been
>>>>> released yet.
>>>>>
>>>>> Ideally I would not like to patch my systems locally, so if you
>>>>> have an
>>>>> ETA on when this will be out officially I would really appreciate it.
>>>>>
>>>>> Links:
>>>>> [1] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status
>>>>> [2] https://docs.gluster.org/en/latest/release-notes/6.4/
>>>>> [3] https://docs.gluster.org/en/latest/release-notes/6.5/
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> alexander iliev
>>>>>
>>>>> On 8/30/19 9:22 AM, Sunny Kumar wrote:
>>>>> > Hi Alexander,
>>>>> >
>>>>> > Thanks for pointing that out!
>>>>> >
>>>>> > But this issue is fixed now you can see below link for bz-link
>>>>> and patch.
>>>>> >
>>>>> > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248
>>>>> >
>>>>> > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/
>>>>> >
>>>>> > Hope this helps.
>>>>> >
>>>>> > /sunny
>>>>> >
>>>>> > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev
>>>>> > <ailiev+gluster at mamul.org <mailto:gluster at mamul.org>> wrote:
>>>>> >>
>>>>> >> Hello dear GlusterFS users list,
>>>>> >>
>>>>> >> I have been trying to set up geo-replication between two
>>>>> clusters for
>>>>> >> some time now. The desired state is (Cluster #1) being
>>>>> replicated to
>>>>> >> (Cluster #2).
>>>>> >>
>>>>> >> Here are some details about the setup:
>>>>> >>
>>>>> >> Cluster #1: three nodes connected via a local network
>>>>> (172.31.35.0/24),
>>>>> >> one replicated (3 replica) volume.
>>>>> >>
>>>>> >> Cluster #2: three nodes connected via a local network
>>>>> (172.31.36.0/24),
>>>>> >> one replicated (3 replica) volume.
>>>>> >>
>>>>> >> The two clusters are connected to the Internet via separate
>>>>> network
>>>>> >> adapters.
>>>>> >>
>>>>> >> Only SSH (port 22) is open on cluster #2 nodes' adapters
>>>>> connected to
>>>>> >> the Internet.
>>>>> >>
>>>>> >> All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed
>>>>> from [1].
>>>>> >>
>>>>> >> The first time I followed the guide[2] everything went fine up
>>>>> until I
>>>>> >> reached the "Create the session" step. That was like a month
>>>>> ago, then I
>>>>> >> had to temporarily stop working in this and now I am coming
>>>>> back to it.
>>>>> >>
>>>>> >> Currently, if I try to see the mountbroker status I get the
>>>>> following:
>>>>> >>
>>>>> >>> # gluster-mountbroker status
>>>>> >>> Traceback (most recent call last):
>>>>> >>> File "/usr/sbin/gluster-mountbroker", line 396, in <module>
>>>>> >>> runcli()
>>>>> >>> File
>>>>> "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line
>>>>> 225,
>>>>> in runcli
>>>>> >>> cls.run(args)
>>>>> >>> File "/usr/sbin/gluster-mountbroker", line 275, in run
>>>>> >>> out = execute_in_peers("node-status")
>>>>> >>> File
>>>>> "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",
>>>>> >> line 127, in execute_in_peers
>>>>> >>> raise GlusterCmdException((rc, out, err, " ".join(cmd)))
>>>>> >>> gluster.cliutils.cliutils.GlusterCmdException: (1, '',
>>>>> 'Unable to
>>>>> >> end. Error : Success\n', 'gluster system:: execute mountbroker.py
>>>>> >> node-status')
>>>>> >>
>>>>> >> And in /var/log/gluster/glusterd.log I have:
>>>>> >>
>>>>> >>> [2019-08-10 15:24:21.418834] E [MSGID: 106336]
>>>>> >> [glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management:
>>>>> Unable to
>>>>> >> end. Error : Success
>>>>> >>> [2019-08-10 15:24:21.418908] E [MSGID: 106122]
>>>>> >> [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management:
>>>>> Commit of
>>>>> >> operation 'Volume Execute system commands' failed on localhost
>>>>> : Unable
>>>>> >> to end. Error : Success
>>>>> >>
>>>>> >> So, I have two questions right now:
>>>>> >>
>>>>> >> 1) Is there anything wrong with my setup (networking, open
>>>>> ports, etc.)?
>>>>> >> Is it expected to work with this setup or should I redo it in a
>>>>> >> different way?
>>>>> >> 2) How can I troubleshoot the current status of my setup? Can
>>>>> I find out
>>>>> >> what's missing/wrong and continue from there or should I just
>>>>> start from
>>>>> >> scratch?
>>>>> >>
>>>>> >> Links:
>>>>> >> [1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu
>>>>> >> [2]
>>>>> >>
>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
>>>>>
>>>>> >>
>>>>> >> Thank you!
>>>>> >>
>>>>> >> Best regards,
>>>>> >> --
>>>>> >> alexander iliev
>>>>> >> _______________________________________________
>>>>> >> Gluster-users mailing list
>>>>> >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list