[Gluster-users] Can't stop (or control) geo-replication?

Thu Apr 24 22:02:33 UTC 2014

No, I still haven't heard anything from the community, and I just removed the ssh keys for the broken systems so they don't try to start up the "bad" replication configs (which is incredibly ugly). Someday soon I'm planning to build a test cluster to experiment on, though, and will follow up if I figure out a solution.

--Danny

Steve Dainard <sdainard at miovision.com> wrote:

>Hi Danny,
>
>
>Did you get anywhere with this geo-rep issue? I have a similar problem running on CentOS 6.5 when trying anything other than 'start' with geo-rep.
>
>
>Thanks,
>
>
>Steve 
>
>
>On Tue, Feb 25, 2014 at 9:45 AM, Danny Sauer <danny at dannysauer.com> wrote:
>
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>I have the current gluster 3.4 running on some RHEL6 systems.  For some reason, all of the geo-replication commands which change a config file (start, stop, config) return failure.  Despite this, "start" actually starts it up.  I'd be mostly ok with this if stop also actually stopped it; but that does not happen.  The "command failed" behavior is consistent across all nodes.  The binaries are the result of downloading the source RPM and "rpm --rebuild"ing, since the packages on the download server still don't install on anything but the latest RHEL6 (that ssl library dependency thing); I didn't change anything, just directly rebuilt from the source package.  I have working ssh between the systems, and files do propagate over; I can see in the logs that ssh does connect and start up the gsyncd.  I just have several test configs that I'd like to not have running now, but they won't stay dead. :)
>
>Is there a way to forcibly remove several geo-replication configs outside of the shell tool?  I tried editing the config file to change the ssh command path for one of them, and my changes kept getting overwritten by metadata from the other nodes (yes, time is in sync on all nodes using ntp against the same server), so I'm assuming that deleting the relevant block from the config file won't do it?
>
>The really weird thing is that other volume management tasks work fine; I can add/remove bricks from volumes, create, start and stop regular volumes, etc.  It's just the geo-replication management part that fails.
>
>Thanks for any input you can provide. :)  Some example output (with username, IP, and hostnames changed to protect the innocent) is below.
>
>- --Danny
>
>
>user at gluster1 [/home/user]
>$ sudo gluster v geo sec ssh://slave_73::geo_sec_73 stop
> 
>geo-replication command failed
>user at gluster1 [/home/user]
>$ sudo gluster v geo sec ssh://slave_73::geo_sec_73 config
>gluster_log_file: /var/log/glusterfs/geo-replication/sec/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Ageo_sec_73.gluster.log
>ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
>session_owner: ace6b109-ba88-4c2e-9381-f2fc31aa36b5
>remote_gsyncd: /usr/libexec/glusterfs/gsyncd
>socketdir: /var/run
>state_file: /var/lib/glusterd/geo-replication/sec/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Ageo_sec_73.status
>state_socket_unencoded: /var/lib/glusterd/geo-replication/sec/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Ageo_sec_73.socket
>gluster_command_dir: /usr/sbin/
>pid_file: /var/lib/glusterd/geo-replication/sec/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Ageo_sec_73.pid
>log_file: /var/log/glusterfs/geo-replication/sec/ssh%3A%2F%2Froot%401.2.3.4%3Agluster%3A%2F%2F127.0.0.1%3Ageo_sec_73.log
>gluster_params: xlator-option=*-dht.assert-no-child-down=true
>user at gluster1 [/home/user]
>$ sudo gluster v geo sec ssh://slave_73::geo_sec_73 status
>NODE                 MASTER               SLAVE                                              STATUS
>- ---------------------------------------------------------------------------------------------------
>gluster1             sec                  ssh://slave_73::geo_sec_73                         faulty
>user at gluster1 [/home/user]
>$ sudo gluster v geo sec ssh://slave_73::geo_sec_73 stop
> 
>geo-replication command failed
>user at gluster1 [/home/user]
>$ sudo gluster v geo sec ssh://slave_73::geo_sec_73 status
>NODE                 MASTER               SLAVE                                              STATUS
>- ---------------------------------------------------------------------------------------------------
>gluster1             sec                  ssh://slave_73::geo_sec_73                         faulty
> 
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.4.14 (GNU/Linux)
>Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
>iEYEARECAAYFAlMMrHEACgkQvtwZjjd2PN8kpQCfVjtKeO7DCvhT9SpK+LEulZVZ
>c0wAn16xAT14V+oNOilbKwHDoM68EIbW
>=QfSZ
>-----END PGP SIGNATURE-----
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140424/9642df54/attachment.html>