[Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)
Alexander Iliev
ailiev+gluster at mamul.org
Wed Oct 16 16:59:29 UTC 2019
Hi Aravinda,
All volume brick on the slave volume are up and the volume seems functional.
Your suggestion about trying to mount the slave volume on a master node
brings up my question about network connectivity again - the GlusterFS
documentation[1] says:
> The server specified in the mount command is only used to fetch the
gluster configuration volfile describing the volume name. Subsequently,
the client will communicate directly with the servers mentioned in the
volfile (which might not even include the one used for mount).
To me this means that the masternode from your example is expected to
have connectivity to the network where the slave volume runs, i.e. to
have network access to the slave nodes. In my geo-replication scenario
this is definitely not the case. The two cluster are running in two
completely different networks that are not interconnected.
So my question is - how is the slave volume mount expected to happen if
the client host cannot access the GlusterFS nodes? Or is the
connectivity a requirement even for geo-replication?
I'm not sure if I'm missing something, but any help will be highly
appreciated!
Thanks!
Links:
[1]
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/
--
alexander iliev
On 10/16/19 6:03 AM, Aravinda Vishwanathapura Krishna Murthy wrote:
> Hi Alexander,
>
> Please check the status of Volume. Looks like the Slave volume mount is
> failing because bricks are down or not reachable. If Volume status shows
> all bricks are up then try mounting the slave volume using mount command.
>
> ```
> masternode$ mkdir /mnt/vol
> masternode$ mount -t glusterfs <slavehost>:<slavevol> /mnt/vol
> ```
>
> On Fri, Oct 11, 2019 at 4:03 AM Alexander Iliev
> <ailiev+gluster at mamul.org <mailto:ailiev%2Bgluster at mamul.org>> wrote:
>
> Hi all,
>
> I ended up reinstalling the nodes with CentOS 7.5 and GlusterFS 6.5
> (installed from the SIG.)
>
> Now when I try to create a replication session I get the following:
>
> > # gluster volume geo-replication store1 <slave-host>::store2 create
> push-pem
> > Unable to mount and fetch slave volume details. Please check the
> log:
> /var/log/glusterfs/geo-replication/gverify-slavemnt.log
> > geo-replication command failed
>
> You can find the contents of gverify-slavemnt.log below, but the
> initial
> error seems to be:
>
> > [2019-10-10 22:07:51.578519] E
> [fuse-bridge.c:5211:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not
> connected)
>
> I only found
> [this](https://bugzilla.redhat.com/show_bug.cgi?id=1659824)
> bug report which doesn't seem to help. The reported issue is failure to
> mount a volume on a GlusterFS client, but in my case I need
> geo-replication which implies the client (geo-replication master) being
> on a different network.
>
> Any help will be appreciated.
>
> Thanks!
>
> gverify-slavemnt.log:
>
> > [2019-10-10 22:07:40.571256] I [MSGID: 100030]
> [glusterfsd.c:2847:main] 0-glusterfs: Started running glusterfs version
> 6.5 (args: glusterfs --xlator-option=*dht.lookup-unhashed=off
> --volfile-server <slave-host> --volfile-id store2 -l
> /var/log/glusterfs/geo-replication/gverify-slavemnt.log
> /tmp/gverify.sh.5nFlRh)
> > [2019-10-10 22:07:40.575438] I [glusterfsd.c:2556:daemonize]
> 0-glusterfs: Pid of current running process is 6021
> > [2019-10-10 22:07:40.584282] I [MSGID: 101190]
> [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 0
> > [2019-10-10 22:07:40.584299] I [MSGID: 101190]
> [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> > [2019-10-10 22:07:40.928094] I [MSGID: 114020]
> [client.c:2393:notify]
> 0-store2-client-0: parent translators are ready, attempting connect on
> transport
> > [2019-10-10 22:07:40.931121] I [MSGID: 114020]
> [client.c:2393:notify]
> 0-store2-client-1: parent translators are ready, attempting connect on
> transport
> > [2019-10-10 22:07:40.933976] I [MSGID: 114020]
> [client.c:2393:notify]
> 0-store2-client-2: parent translators are ready, attempting connect on
> transport
> > Final graph:
> >
> +------------------------------------------------------------------------------+
> > 1: volume store2-client-0
> > 2: type protocol/client
> > 3: option ping-timeout 42
> > 4: option remote-host 172.31.36.11
> > 5: option remote-subvolume /data/gfs/store1/1/brick-store2
> > 6: option transport-type socket
> > 7: option transport.address-family inet
> > 8: option transport.socket.ssl-enabled off
> > 9: option transport.tcp-user-timeout 0
> > 10: option transport.socket.keepalive-time 20
> > 11: option transport.socket.keepalive-interval 2
> > 12: option transport.socket.keepalive-count 9
> > 13: option send-gids true
> > 14: end-volume
> > 15:
> > 16: volume store2-client-1
> > 17: type protocol/client
> > 18: option ping-timeout 42
> > 19: option remote-host 172.31.36.12
> > 20: option remote-subvolume /data/gfs/store1/1/brick-store2
> > 21: option transport-type socket
> > 22: option transport.address-family inet
> > 23: option transport.socket.ssl-enabled off
> > 24: option transport.tcp-user-timeout 0
> > 25: option transport.socket.keepalive-time 20
> > 26: option transport.socket.keepalive-interval 2
> > 27: option transport.socket.keepalive-count 9
> > 28: option send-gids true
> > 29: end-volume
> > 30:
> > 31: volume store2-client-2
> > 32: type protocol/client
> > 33: option ping-timeout 42
> > 34: option remote-host 172.31.36.13
> > 35: option remote-subvolume /data/gfs/store1/1/brick-store2
> > 36: option transport-type socket
> > 37: option transport.address-family inet
> > 38: option transport.socket.ssl-enabled off
> > 39: option transport.tcp-user-timeout 0
> > 40: option transport.socket.keepalive-time 20
> > 41: option transport.socket.keepalive-interval 2
> > 42: option transport.socket.keepalive-count 9
> > 43: option send-gids true
> > 44: end-volume
> > 45:
> > 46: volume store2-replicate-0
> > 47: type cluster/replicate
> > 48: option afr-pending-xattr
> store2-client-0,store2-client-1,store2-client-2
> > 49: option use-compound-fops off
> > 50: subvolumes store2-client-0 store2-client-1 store2-client-2
> > 51: end-volume
> > 52:
> > 53: volume store2-dht
> > 54: type cluster/distribute
> > 55: option lookup-unhashed off
> > 56: option lock-migration off
> > 57: option force-migration off
> > 58: subvolumes store2-replicate-0
> > 59: end-volume
> > 60:
> > 61: volume store2-write-behind
> > 62: type performance/write-behind
> > 63: subvolumes store2-dht
> > 64: end-volume
> > 65:
> > 66: volume store2-read-ahead
> > 67: type performance/read-ahead
> > 68: subvolumes store2-write-behind
> > 69: end-volume
> > 70:
> > 71: volume store2-readdir-ahead
> > 72: type performance/readdir-ahead
> > 73: option parallel-readdir off
> > 74: option rda-request-size 131072
> > 75: option rda-cache-limit 10MB
> > 76: subvolumes store2-read-ahead
> > 77: end-volume
> > 78:
> > 79: volume store2-io-cache
> > 80: type performance/io-cache
> > 81: subvolumes store2-readdir-ahead
> > 82: end-volume
> > 83:
> > 84: volume store2-open-behind
> > 85: type performance/open-behind
> > 86: subvolumes store2-io-cache
> > 87: end-volume
> > 88:
> > 89: volume store2-quick-read
> > 90: type performance/quick-read
> > 91: subvolumes store2-open-behind
> > 92: end-volume
> > 93:
> > 94: volume store2-md-cache
> > 95: type performance/md-cache
> > 96: subvolumes store2-quick-read
> > 97: end-volume
> > 98:
> > 99: volume store2
> > 100: type debug/io-stats
> > 101: option log-level INFO
> > 102: option latency-measurement off
> > 103: option count-fop-hits off
> > 104: subvolumes store2-md-cache
> > 105: end-volume
> > 106:
> > 107: volume meta-autoload
> > 108: type meta
> > 109: subvolumes store2
> > 110: end-volume
> > 111:
> >
> +------------------------------------------------------------------------------+
> > [2019-10-10 22:07:51.578287] I [fuse-bridge.c:5142:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24
> kernel 7.22
> > [2019-10-10 22:07:51.578356] I [fuse-bridge.c:5753:fuse_graph_sync]
> 0-fuse: switched to graph 0
> > [2019-10-10 22:07:51.578467] I [MSGID: 108006]
> [afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no
> subvolumes up
> > [2019-10-10 22:07:51.578519] E
> [fuse-bridge.c:5211:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not
> connected)
> > [2019-10-10 22:07:51.578709] W [fuse-bridge.c:1266:fuse_attr_cbk]
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
> connected)
> > [2019-10-10 22:07:51.578687] I [MSGID: 108006]
> [afr-common.c:5666:afr_local_init] 0-store2-replicate-0: no
> subvolumes up
> > [2019-10-10 22:09:48.222459] E [MSGID: 108006]
> [afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0:
> All subvolumes are down. Going offline until at least one of them comes
> back up.
> > The message "E [MSGID: 108006]
> [afr-common.c:5318:__afr_handle_child_down_event] 0-store2-replicate-0:
> All subvolumes are down. Going offline until at least one of them comes
> back up." repeated 2 times between [2019-10-10 22:09:48.222459] and
> [2019-10-10 22:09:48.222891]
> >
>
> alexander iliev
>
> On 9/8/19 4:50 PM, Alexander Iliev wrote:
> > Hi all,
> >
> > Sunny, thank you for the update.
> >
> > I have applied the patch locally on my slave system and now the
> > mountbroker setup is successful.
> >
> > I am facing another issue though - when I try to create a
> replication
> > session between the two sites I am getting:
> >
> > # gluster volume geo-replication store1
> > glustergeorep@<slave-host>::store1 create push-pem
> > Error : Request timed out
> > geo-replication command failed
> >
> > It is still unclear to me if my setup is expected to work at all.
> >
> > Reading the geo-replication documentation at [1] I see this
> paragraph:
> >
> > > A password-less SSH connection is also required for gsyncd
> between
> > every node in the master to every node in the slave. The gluster
> > system:: execute gsec_create command creates secret-pem files on
> all the
> > nodes in the master, and is used to implement the password-less SSH
> > connection. The push-pem option in the geo-replication create
> command
> > pushes these keys to all the nodes in the slave.
> >
> > It is not clear to me whether connectivity from each master node
> to each
> > slave node is a requirement in terms of networking. In my setup the
> > slave nodes form the Gluster pool over a private network which is
> not
> > reachable from the master site.
> >
> > Any ideas how to proceed from here will be greatly appreciated.
> >
> > Thanks!
> >
> > Links:
> > [1]
> >
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/sect-preparing_to_deploy_geo-replication
>
> >
> >
> > Best regards,
> > --
> > alexander iliev
> >
> > On 9/3/19 2:50 PM, Sunny Kumar wrote:
> >> Thank you for the explanation Kaleb.
> >>
> >> Alexander,
> >>
> >> This fix will be available with next release for all supported
> versions.
> >>
> >> /sunny
> >>
> >> On Mon, Sep 2, 2019 at 6:47 PM Kaleb Keithley
> <kkeithle at redhat.com <mailto:kkeithle at redhat.com>>
> >> wrote:
> >>>
> >>> Fixes on master (before or after the release-7 branch was taken)
> >>> almost certainly warrant a backport IMO to at least release-6, and
> >>> probably release-5 as well.
> >>>
> >>> We used to have a "tracker" BZ for each minor release (e.g.
> 6.6) to
> >>> keep track of backports by cloning the original BZ and changing
> the
> >>> Version, and adding that BZ to the tracker. I'm not sure what
> >>> happened to that practice. The last ones I can find are for 6.3
> and
> >>> 5.7; https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-6.3 and
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.7
> >>>
> >>> It isn't enough to just backport recent fixes on master to
> release-7.
> >>> We are supposedly continuing to maintain release-6 and release-5
> >>> after release-7 GAs. If that has changed, I haven't seen an
> >>> announcement to that effect. I don't know why our developers don't
> >>> automatically backport to all the actively maintained releases.
> >>>
> >>> Even if there isn't a tracker BZ, you can always create a
> backport BZ
> >>> by cloning the original BZ and change the release to 6. That'd
> be a
> >>> good place to start.
> >>>
> >>> On Sun, Sep 1, 2019 at 8:45 AM Alexander Iliev
> >>> <ailiev+gluster at mamul.org <mailto:ailiev%2Bgluster at mamul.org>>
> wrote:
> >>>>
> >>>> Hi Strahil,
> >>>>
> >>>> Yes, this might be right, but I would still expect fixes like
> this
> >>>> to be
> >>>> released for all supported major versions (which should
> include 6.) At
> >>>> least that's how I understand
> >>>> https://www.gluster.org/release-schedule/.
> >>>>
> >>>> Anyway, let's wait for Sunny to clarify.
> >>>>
> >>>> Best regards,
> >>>> alexander iliev
> >>>>
> >>>> On 9/1/19 2:07 PM, Strahil Nikolov wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> I'm not very deep into bugzilla stuff, but for me NEXTRELEASE
> means
> >>>>> v7.
> >>>>>
> >>>>> Sunny,
> >>>>> Am I understanding it correctly ?
> >>>>>
> >>>>> Best Regards,
> >>>>> Strahil Nikolov
> >>>>>
> >>>>> В неделя, 1 септември 2019 г., 14:27:32 ч. Гринуич+3,
> Alexander Iliev
> >>>>> <ailiev+gluster at mamul.org
> <mailto:ailiev%2Bgluster at mamul.org>> написа:
> >>>>>
> >>>>>
> >>>>> Hi Sunny,
> >>>>>
> >>>>> Thank you for the quick response.
> >>>>>
> >>>>> It's not clear to me however if the fix has been already
> released
> >>>>> or not.
> >>>>>
> >>>>> The bug status is CLOSED NEXTRELEASE and according to [1] the
> >>>>> NEXTRELEASE resolution means that the fix will be included in
> the next
> >>>>> supported release. The bug is logged against the mainline version
> >>>>> though, so I'm not sure what this means exactly.
> >>>>>
> >>>>> From the 6.4[2] and 6.5[3] release notes it seems it hasn't
> been
> >>>>> released yet.
> >>>>>
> >>>>> Ideally I would not like to patch my systems locally, so if you
> >>>>> have an
> >>>>> ETA on when this will be out officially I would really
> appreciate it.
> >>>>>
> >>>>> Links:
> >>>>> [1]
> https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status
> >>>>> [2] https://docs.gluster.org/en/latest/release-notes/6.4/
> >>>>> [3] https://docs.gluster.org/en/latest/release-notes/6.5/
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> alexander iliev
> >>>>>
> >>>>> On 8/30/19 9:22 AM, Sunny Kumar wrote:
> >>>>> > Hi Alexander,
> >>>>> >
> >>>>> > Thanks for pointing that out!
> >>>>> >
> >>>>> > But this issue is fixed now you can see below link for
> bz-link
> >>>>> and patch.
> >>>>> >
> >>>>> > BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1709248
> >>>>> >
> >>>>> > Patch - https://review.gluster.org/#/c/glusterfs/+/22716/
> >>>>> >
> >>>>> > Hope this helps.
> >>>>> >
> >>>>> > /sunny
> >>>>> >
> >>>>> > On Fri, Aug 30, 2019 at 2:30 AM Alexander Iliev
> >>>>> > <ailiev+gluster at mamul.org
> <mailto:ailiev%2Bgluster at mamul.org> <mailto:gluster at mamul.org
> <mailto:gluster at mamul.org>>> wrote:
> >>>>> >>
> >>>>> >> Hello dear GlusterFS users list,
> >>>>> >>
> >>>>> >> I have been trying to set up geo-replication between two
> >>>>> clusters for
> >>>>> >> some time now. The desired state is (Cluster #1) being
> >>>>> replicated to
> >>>>> >> (Cluster #2).
> >>>>> >>
> >>>>> >> Here are some details about the setup:
> >>>>> >>
> >>>>> >> Cluster #1: three nodes connected via a local network
> >>>>> (172.31.35.0/24 <http://172.31.35.0/24>),
> >>>>> >> one replicated (3 replica) volume.
> >>>>> >>
> >>>>> >> Cluster #2: three nodes connected via a local network
> >>>>> (172.31.36.0/24 <http://172.31.36.0/24>),
> >>>>> >> one replicated (3 replica) volume.
> >>>>> >>
> >>>>> >> The two clusters are connected to the Internet via separate
> >>>>> network
> >>>>> >> adapters.
> >>>>> >>
> >>>>> >> Only SSH (port 22) is open on cluster #2 nodes' adapters
> >>>>> connected to
> >>>>> >> the Internet.
> >>>>> >>
> >>>>> >> All nodes are running Ubuntu 18.04 and GlusterFS 6.3
> installed
> >>>>> from [1].
> >>>>> >>
> >>>>> >> The first time I followed the guide[2] everything went
> fine up
> >>>>> until I
> >>>>> >> reached the "Create the session" step. That was like a
> month
> >>>>> ago, then I
> >>>>> >> had to temporarily stop working in this and now I am coming
> >>>>> back to it.
> >>>>> >>
> >>>>> >> Currently, if I try to see the mountbroker status I get the
> >>>>> following:
> >>>>> >>
> >>>>> >>> # gluster-mountbroker status
> >>>>> >>> Traceback (most recent call last):
> >>>>> >>> File "/usr/sbin/gluster-mountbroker", line 396, in
> <module>
> >>>>> >>> runcli()
> >>>>> >>> File
> >>>>>
> "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line
> >>>>> 225,
> >>>>> in runcli
> >>>>> >>> cls.run(args)
> >>>>> >>> File "/usr/sbin/gluster-mountbroker", line 275, in run
> >>>>> >>> out = execute_in_peers("node-status")
> >>>>> >>> File
> >>>>> "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",
> >>>>> >> line 127, in execute_in_peers
> >>>>> >>> raise GlusterCmdException((rc, out, err, "
> ".join(cmd)))
> >>>>> >>> gluster.cliutils.cliutils.GlusterCmdException: (1, '',
> >>>>> 'Unable to
> >>>>> >> end. Error : Success\n', 'gluster system:: execute
> mountbroker.py
> >>>>> >> node-status')
> >>>>> >>
> >>>>> >> And in /var/log/gluster/glusterd.log I have:
> >>>>> >>
> >>>>> >>> [2019-08-10 15:24:21.418834] E [MSGID: 106336]
> >>>>> >> [glusterd-geo-rep.c:5413:glusterd_op_sys_exec]
> 0-management:
> >>>>> Unable to
> >>>>> >> end. Error : Success
> >>>>> >>> [2019-08-10 15:24:21.418908] E [MSGID: 106122]
> >>>>> >> [glusterd-syncop.c:1445:gd_commit_op_phase] 0-management:
> >>>>> Commit of
> >>>>> >> operation 'Volume Execute system commands' failed on
> localhost
> >>>>> : Unable
> >>>>> >> to end. Error : Success
> >>>>> >>
> >>>>> >> So, I have two questions right now:
> >>>>> >>
> >>>>> >> 1) Is there anything wrong with my setup (networking, open
> >>>>> ports, etc.)?
> >>>>> >> Is it expected to work with this setup or should I redo
> it in a
> >>>>> >> different way?
> >>>>> >> 2) How can I troubleshoot the current status of my
> setup? Can
> >>>>> I find out
> >>>>> >> what's missing/wrong and continue from there or should I
> just
> >>>>> start from
> >>>>> >> scratch?
> >>>>> >>
> >>>>> >> Links:
> >>>>> >> [1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu
> >>>>> >> [2]
> >>>>> >>
> >>>>>
> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
>
> >>>>>
> >>>>> >>
> >>>>> >> Thank you!
> >>>>> >>
> >>>>> >> Best regards,
> >>>>> >> --
> >>>>> >> alexander iliev
> >>>>> >> _______________________________________________
> >>>>> >> Gluster-users mailing list
> >>>>> >> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> >>>>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
> >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>> _______________________________________________
> >>>> Gluster-users mailing list
> >>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/118564314
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/118564314
>
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> regards
> Aravinda VK
More information about the Gluster-users
mailing list