[Bugs] [Bug 1654118] New: [geo-rep]: Failover / Failback shows fault status in a non-root setup
bugzilla at redhat.com
bugzilla at redhat.com
Wed Nov 28 05:00:57 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1654118
Bug ID: 1654118
Summary: [geo-rep]: Failover / Failback shows fault status in a
non-root setup
Product: GlusterFS
Version: 4.1
Component: geo-replication
Keywords: ZStream
Severity: low
Priority: low
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: bugs at gluster.org, csaba at redhat.com, rallan at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
sankarshan at redhat.com, storage-qa-internal at redhat.com,
ygoitom at redhat.com
Depends On: 1510752, 1651498
Blocks: 1654117
External Bug ID: Gluster.org Gerrit 21689
+++ This bug was initially created as a clone of Bug #1651498 +++
+++ This bug was initially created as a clone of Bug #1510752 +++
Description of problem:
=======================
While executing a failover / failback scenario on a non-root geo-rep set up,
while starting the original non-root session between the master and the slave,
the status is faulty.
The logs show the following:
[2017-11-08 06:52:08.899] E [resource(/rhs/brick1/b1):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=
auto -S /tmp/gsyncd-aux-ssh-ozvxWN/ab5534f3bb3f74602da3c8c3068a4aa5.sock
geoaccount at 10.70.43.175 /nonexistent/gsyncd --session-owner
b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick1%2Fb1 --local-
node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned
with 255, saying:
[2017-11-08 06:52:08.1159] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1418] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
[2017-11-08 06:52:08.1662] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1856] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too
open.
[2017-11-08 06:52:08.2038] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:08.2216] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
This private key will be ignored.
[2017-11-08 06:52:08.2465] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:08.2824] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:08.3571] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>:
exiting.
[2017-11-08 06:52:08.7929] I [monitor(monitor):347:monitor] Monitor:
worker(/rhs/brick1/b1) died before establishing connection
[2017-11-08 06:52:08.8866] I [repce(/rhs/brick1/b1):92:service_loop]
RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:08.9479] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.353275] I [monitor(monitor):275:monitor] Monitor: starting
gsyncd worker(/rhs/brick2/b4). Slave node:
ssh://geoaccount@10.70.43.175:gluster://localhost:slave
[2017-11-08 06:52:17.565080] I [resource(/rhs/brick2/b4):1684:connect_remote]
SSH: Initializing SSH connection between master and slave...
[2017-11-08 06:52:17.567466] I [changelogagent(/rhs/brick2/b4):73:__init__]
ChangelogAgent: Agent listining...
[2017-11-08 06:52:17.714301] E
[syncdutils(/rhs/brick2/b4):269:log_raise_exception] <top>: connection to peer
is broken
[2017-11-08 06:52:17.715086] E [resource(/rhs/brick2/b4):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast
er=auto -S /tmp/gsyncd-aux-ssh-S6S9iP/ab5534f3bb3f74602da3c8c3068a4aa5.sock
geoaccount at 10.70.43.175 /nonexistent/gsyncd --session-owner
b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick2%2Fb4 --loc
al-node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave"
returned with 255, saying:
[2017-11-08 06:52:17.715459] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.715709] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @ WARNING: UNPROTECTED PRIVATE KEY FILE! @
[2017-11-08 06:52:17.715914] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.716105] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are
too open.
[2017-11-08 06:52:17.716289] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:17.716600] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> This private key will be ignored.
[2017-11-08 06:52:17.716799] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:17.717060] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:17.717834] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.721502] I [repce(/rhs/brick2/b4):92:service_loop]
RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:17.722084] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.721748] I [monitor(monitor):347:monitor] Monitor:
worker(/rhs/brick2/b4) died before establishing connection
Version-Release number of selected component (if applicable):
=============================================================
mainline
Steps to Reproduce:
===================
1. Created a non-root session between the master and the slave
2. Stopped the master volume with the force option
3. Promoted slave to master
4. Brought master back online and stopped original geo-rep session between
original master and slave
5. Set up non-root session from original slave to original master and wrote
some data
6. Stopped IO and set checkpoint
7. Waited for checkpoint to complete
8. Stopped and deleted geo-rep session between original slave to original
master
9. Reset the options that promoted slave volume as master volume
10. Resume the original session between the original master and original slave
Actual results:
===============
Geo-rep status was faulty
Expected results:
================
Geo-rep status should be ACTIVE / PASSIVE
A simple non-root session was set up between the master and slave. The
following was observed.
while executing : gluster-mountbroker setup /var/mountbroker-root geogroup
on the slave, it was noticed that under /var/lib/glusterd/ the permission for
the geo-replication directory changes from drwxr-xr-x to drwxrwx---
--- Additional comment from Worker Ant on 2018-11-20 03:45:02 EST ---
REVIEW: https://review.gluster.org/21689 (geo-rep: Fix permissions with
non-root setup) posted (#1) for review on master by Kotresh HR
--- Additional comment from Kotresh HR on 2018-11-20 03:47:14 EST ---
Summary:
geo-rep: Fix permissions with non-root setup
Problem:
In non-root fail-over/fail-back(FO/FB), when slave is
promoted as master, the session goes to 'Faulty'
Cause:
The command 'gluster-mountbroker <mountbroker-root> <group>'
is run as a pre-requisite on slave in non-root setup.
It modifies the permission and group of following required
directories and files recursively
[1] /var/lib/glusterd/geo-replication
[2] /var/log/glusterfs/geo-replication-slaves
In a normal setup, this is executed on slave node and hence
doing it recursively is not an issue on [1]. But when original
master becomes slave in non-root during FO/FB, it contains
ssh public keys and modifying permissions on them causes
geo-rep to fail with incorrect permissions.
Fix:
Don't do permission change recursively. Fix permissions for
required files.
--- Additional comment from Worker Ant on 2018-11-25 23:21:28 EST ---
REVIEW: https://review.gluster.org/21689 (geo-rep: Fix permissions with
non-root setup) posted (#3) for review on master by Amar Tumballi
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1510752
[Bug 1510752] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
https://bugzilla.redhat.com/show_bug.cgi?id=1651498
[Bug 1651498] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
https://bugzilla.redhat.com/show_bug.cgi?id=1654117
[Bug 1654117] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list