[Bugs] [Bug 1654118] New: [geo-rep]: Failover / Failback shows fault status in a non-root setup

bugzilla at redhat.com bugzilla at redhat.com
Wed Nov 28 05:00:57 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1654118

            Bug ID: 1654118
           Summary: [geo-rep]: Failover / Failback shows fault status in a
                    non-root setup
           Product: GlusterFS
           Version: 4.1
         Component: geo-replication
          Keywords: ZStream
          Severity: low
          Priority: low
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: bugs at gluster.org, csaba at redhat.com, rallan at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, storage-qa-internal at redhat.com,
                    ygoitom at redhat.com
        Depends On: 1510752, 1651498
            Blocks: 1654117
   External Bug ID: Gluster.org Gerrit 21689



+++ This bug was initially created as a clone of Bug #1651498 +++

+++ This bug was initially created as a clone of Bug #1510752 +++

Description of problem:
=======================

While executing a failover / failback scenario on a non-root geo-rep set up,
while starting the original non-root session between the master and the slave,
the status is faulty.

The logs show the following:


[2017-11-08 06:52:08.899] E [resource(/rhs/brick1/b1):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=
auto -S /tmp/gsyncd-aux-ssh-ozvxWN/ab5534f3bb3f74602da3c8c3068a4aa5.sock
geoaccount at 10.70.43.175 /nonexistent/gsyncd --session-owner
b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick1%2Fb1 --local-
node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave" returned
with 255, saying:
[2017-11-08 06:52:08.1159] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1418] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
[2017-11-08 06:52:08.1662] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:08.1856] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are too
open.
[2017-11-08 06:52:08.2038] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:08.2216] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
This private key will be ignored.
[2017-11-08 06:52:08.2465] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:08.2824] E [resource(/rhs/brick1/b1):238:logerr] Popen: ssh>
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:08.3571] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>:
exiting.
[2017-11-08 06:52:08.7929] I [monitor(monitor):347:monitor] Monitor:
worker(/rhs/brick1/b1) died before establishing connection
[2017-11-08 06:52:08.8866] I [repce(/rhs/brick1/b1):92:service_loop]
RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:08.9479] I [syncdutils(/rhs/brick1/b1):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.353275] I [monitor(monitor):275:monitor] Monitor: starting
gsyncd worker(/rhs/brick2/b4). Slave node:
ssh://geoaccount@10.70.43.175:gluster://localhost:slave
[2017-11-08 06:52:17.565080] I [resource(/rhs/brick2/b4):1684:connect_remote]
SSH: Initializing SSH connection between master and slave...
[2017-11-08 06:52:17.567466] I [changelogagent(/rhs/brick2/b4):73:__init__]
ChangelogAgent: Agent listining...
[2017-11-08 06:52:17.714301] E
[syncdutils(/rhs/brick2/b4):269:log_raise_exception] <top>: connection to peer
is broken
[2017-11-08 06:52:17.715086] E [resource(/rhs/brick2/b4):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMast
er=auto -S /tmp/gsyncd-aux-ssh-S6S9iP/ab5534f3bb3f74602da3c8c3068a4aa5.sock
geoaccount at 10.70.43.175 /nonexistent/gsyncd --session-owner
b4645ef5-836f-4605-98b3-207abd550fc0 --local-id .%2Frhs%2Fbrick2%2Fb4 --loc
al-node 10.70.43.14 -N --listen --timeout 120 gluster://localhost:slave"
returned with 255, saying:
[2017-11-08 06:52:17.715459] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.715709] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
[2017-11-08 06:52:17.715914] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2017-11-08 06:52:17.716105] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Permissions 0770 for '/var/lib/glusterd/geo-replication/secret.pem' are
too open.
[2017-11-08 06:52:17.716289] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> It is required that your private key files are NOT accessible by others.
[2017-11-08 06:52:17.716600] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> This private key will be ignored.
[2017-11-08 06:52:17.716799] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Load key "/var/lib/glusterd/geo-replication/secret.pem": bad permissions
[2017-11-08 06:52:17.717060] E [resource(/rhs/brick2/b4):238:logerr] Popen:
ssh> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[2017-11-08 06:52:17.717834] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.721502] I [repce(/rhs/brick2/b4):92:service_loop]
RepceServer: terminating on reaching EOF.
[2017-11-08 06:52:17.722084] I [syncdutils(/rhs/brick2/b4):237:finalize] <top>:
exiting.
[2017-11-08 06:52:17.721748] I [monitor(monitor):347:monitor] Monitor:
worker(/rhs/brick2/b4) died before establishing connection


Version-Release number of selected component (if applicable):
=============================================================
mainline


Steps to Reproduce:
===================
1. Created a non-root session between the master and the slave 
2. Stopped the master volume with the force option
3. Promoted slave to master
4. Brought master back online and stopped original geo-rep session between
original master and slave
5. Set up non-root session from original slave to original master and wrote
some data
6. Stopped IO and set checkpoint
7. Waited for checkpoint to complete
8. Stopped and deleted geo-rep session between original slave to original
master
9. Reset the options that promoted slave volume as master volume
10. Resume the original session between the original master and original slave


Actual results:
===============
Geo-rep status was faulty

Expected results:
================
Geo-rep status should be ACTIVE / PASSIVE


A simple non-root session was set up between the master and slave. The
following was observed.

while executing : gluster-mountbroker setup /var/mountbroker-root geogroup
on the slave, it was noticed that under /var/lib/glusterd/ the permission for
the geo-replication directory changes from drwxr-xr-x to drwxrwx---

--- Additional comment from Worker Ant on 2018-11-20 03:45:02 EST ---

REVIEW: https://review.gluster.org/21689 (geo-rep: Fix permissions with
non-root setup) posted (#1) for review on master by Kotresh HR

--- Additional comment from Kotresh HR on 2018-11-20 03:47:14 EST ---

Summary:

geo-rep: Fix permissions with non-root setup

Problem:
    In non-root fail-over/fail-back(FO/FB), when slave is
    promoted as master, the session goes to 'Faulty'

Cause:
    The command 'gluster-mountbroker <mountbroker-root> <group>'
    is run as a pre-requisite on slave in non-root setup.
    It modifies the permission and group of following required
    directories and files recursively

      [1] /var/lib/glusterd/geo-replication
      [2] /var/log/glusterfs/geo-replication-slaves

    In a normal setup, this is executed on slave node and hence
    doing it recursively is not an issue on [1]. But when original
    master becomes slave in non-root during FO/FB, it contains
    ssh public keys and modifying permissions on them causes
    geo-rep to fail with incorrect permissions.

Fix:
    Don't do permission change recursively. Fix permissions for
    required files.

--- Additional comment from Worker Ant on 2018-11-25 23:21:28 EST ---

REVIEW: https://review.gluster.org/21689 (geo-rep: Fix permissions with
non-root setup) posted (#3) for review on master by Amar Tumballi


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1510752
[Bug 1510752] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
https://bugzilla.redhat.com/show_bug.cgi?id=1651498
[Bug 1651498] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
https://bugzilla.redhat.com/show_bug.cgi?id=1654117
[Bug 1654117] [geo-rep]: Failover / Failback shows fault status in a
non-root setup
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list