[Gluster-users] Geo-Replication push-pem actually does'nt append common_secret_pub.pem to authrorized_keys file

Mon Feb 2 19:44:49 UTC 2015

Update:

Sound that the active node is finally fixed, but sound that rsync process are running from nodeA (I don’t understand the master notion so) and nodeA is the more used node so its load average become dangerously high.

How to force a geo-replication to be stated from a specific node (master).

I still don’t understand why I have 3 masters…

--
Cyril Peponnet

On Feb 2, 2015, at 11:00 AM, PEPONNET, Cyril N (Cyril) <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>> wrote:

But now I have strange issue:

After creating the geo-rep session and starting it (from nodeB):

[root at nodeB]#  gluster vol geo-replication myvol slaveA::myvol status detail

MASTER NODE     MASTER VOL    MASTER BRICK               SLAVE            STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
nodeB           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0
nodeA           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0
nodeC           myvol         /export/raid/myvol         slaveA::myvol    Active     N/A                  Hybrid Crawl    0              8191             0                0                  0

[root at nodeB]#  gluster vol geo-replication myvol slaveA::myvol status detail

MASTER NODE     MASTER VOL    MASTER BRICK               SLAVE            STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
nodeB           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0
nodeA           myvol         /export/raid/myvol         slaveA::myvol    Active     N/A                  Hybrid Crawl    0              8191             0                0                  0
nodeC           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0

[root at nodeB]#  gluster vol geo-replication myvol slaveA::myvol status detail

MASTER NODE     MASTER VOL    MASTER BRICK               SLAVE            STATUS     CHECKPOINT STATUS    CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
nodeB           myvol         /export/raid/myvol         slaveA::myvol    Active     N/A                  Hybrid Crawl    0              8191             0                0                  0
nodeA           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0
nodeC           myvol         /export/raid/myvol         slaveA::myvol    Passive    N/A                  N/A             0              0                0                0                  0

So.

    1/Why there is 3 masters nodes ??? nodeB should be the master node only
    2/Why it kept changing turn by turn from active to passive?

Thanks

--
Cyril Peponnet

On Feb 2, 2015, at 10:40 AM, PEPONNET, Cyril N (Cyril) <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>> wrote:

For the record, after adding

operating-version=2

on every nodes (ABC) AND slave node, the commands are working
--
Cyril Peponnet

On Feb 2, 2015, at 9:46 AM, PEPONNET, Cyril N (Cyril) <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>> wrote:

More informations here:

I update the state of the peer in the uid file located in /v/l/g/peers from state 10 to state 3 (as it is on other node) and now the node is in cluster.

gluster system:: execute gsec_create now create a proper file from master node with every node’s key in it.

Now from there I try to create my georeplication between master nodeB and slaveA

gluster vol geo myvol slave::myvol create push-pem force

>From slaveA I got this error message logs:

[2015-02-02 17:19:04.754809] E [glusterd-geo-rep.c:1686:glusterd_op_stage_copy_file] 0-: Op Version not supported.
[2015-02-02 17:19:04.754890] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Copy File' failed on localhost : One or more nodes do not support the required op version.
[2015-02-02 17:19:07.513547] E [glusterd-geo-rep.c:1620:glusterd_op_stage_sys_exec] 0-: Op Version not supported.
[2015-02-02 17:19:07.513632] E [glusterd-geo-rep.c:1658:glusterd_op_stage_sys_exec] 0-: One or more nodes do not support the required op version.
[2015-02-02 17:19:07.513660] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Execute system commands' failed on localhost : One or more nodes do not support the required op version.

On slaveA I have the common pem file transfered in /v/l/g/geo/ with my 3 nodes from source site.

But the /root/.ssh/authorized_keys is not populated with this file.

>From the log I saw that there is a call to a script

/var/lib/glusterd/hooks/1/gsync-create/post/S56glusterd-geo-rep-create-post.sh —volname=myvol is_push_pem=1 pub_file=/var/lib/glusterd/geo-replication/common_secret.pem.pub slave_ip=salveA

In this script the following is done:

```
    scp $pub_file $slave_ip:$pub_file_tmp
    ssh $slave_ip "mv $pub_file_tmp $pub_file"
    ssh $slave_ip "gluster system:: copy file /geo-replication/common_secret.pem.pub > /dev/null"
    ssh $slave_ip “gluster system:: execute add_secret_pub > /dev/null"
```

The first 2 lines passed, the third fail so the fourth is never executed.

Third command on slaveA

#gluster system:: copy file /geo-replication/common_secret.pem.pub
One or more nodes do not support the required op version.

# gluster peer status
Number of Peers: 0

from logs:

==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2015-02-02 17:43:29.242524] E [glusterd-geo-rep.c:1686:glusterd_op_stage_copy_file] 0-: Op Version not supported.
[2015-02-02 17:43:29.242610] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Copy File' failed on localhost : One or more nodes do not support the required op version.
One or more nodes do not support the required op version.

I have for now only one node on my remote site.

Any way, as this step is done to copy the file accros all the cluster member I can deal without

The fourth command is not working:

[root at slaveA geo-replication]# gluster system:: execute add_secret_pub
[2015-02-02 17:44:49.123326] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2015-02-02 17:44:49.123381] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2015-02-02 17:44:49.123568] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2015-02-02 17:44:49.123588] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2015-02-02 17:44:49.306482] I [socket.c:2238:socket_event_handler] 0-transport: disconnecting now

==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2015-02-02 17:44:49.307921] E [glusterd-geo-rep.c:1620:glusterd_op_stage_sys_exec] 0-: Op Version not supported.
[2015-02-02 17:44:49.308009] E [glusterd-geo-rep.c:1658:glusterd_op_stage_sys_exec] 0-: One or more nodes do not support the required op version.
[2015-02-02 17:44:49.308038] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Execute system commands' failed on localhost : One or more nodes do not support the required op version.
One or more nodes do not support the required op version.

==> /var/log/glusterfs/cli.log <==
[2015-02-02 17:44:49.308493] I [input.c:36:cli_batch] 0-: Exiting with: -1

I have only one node… I don’t understand the meaning of the errror: One or more nodes do not support the required op version.

 --
Cyril Peponnet

On Feb 2, 2015, at 8:49 AM, PEPONNET, Cyril N (Cyril) <cyril.peponnet at alcatel-lucent.com<mailto:cyril.peponnet at alcatel-lucent.com>> wrote:

Every node is connected:

root at nodeA geo-replication]# gluster peer status
Number of Peers: 2

Hostname: nodeB
Uuid: 6a9da7fc-70ec-4302-8152-0e61929a7c8b
State: Peer in Cluster (Connected)

Hostname: nodeC
Uuid: c12353b5-f41a-4911-9329-fee6a8d529de
State: Peer in Cluster (Connected)

[root at nodeB ~]# gluster peer status
Number of Peers: 2

Hostname: nodeC
Uuid: c12353b5-f41a-4911-9329-fee6a8d529de
State: Peer in Cluster (Connected)

Hostname: nodeA
Uuid: 2ac172bb-a2d0-44f1-9e09-6b054dbf8980
State: Peer is connected and Accepted (Connected)

[root at nodeC geo-replication]# gluster peer status
Number of Peers: 2

Hostname: nodeA
Uuid: 2ac172bb-a2d0-44f1-9e09-6b054dbf8980
State: Peer in Cluster (Connected)

Hostname: nodeB
Uuid: 6a9da7fc-70ec-4302-8152-0e61929a7c8b
State: Peer in Cluster (Connected)

The only difference is State: Peer is connected and Accepted (Connected)  from nodeB about nodeA

When I execute gluster system from node A or C, I have the 3 nodes keys in common pem file. But from nodeB, I only have keys for nodeB and node C. This is infortunate as I try to launch the georeplication job from nodeB (master).

--
Cyril Peponnet

On Feb 2, 2015, at 2:07 AM, Aravinda <avishwan at redhat.com<mailto:avishwan at redhat.com>> wrote:

Looks like node C is in diconnected state. Please let us know the output of `gluster peer status` from all the master nodes and slave nodes.

--
regards
Aravinda

On 01/22/2015 12:27 AM, PEPONNET, Cyril N (Cyril) wrote:
So,

On master node of my 3 node setup:

1) gluster system:: execute gsec_create

in /var/lib/glusterd/geo-replication/common_secret.pub I have pem pub key from master node A and node B (not node C).

On node C in don’t have anything in /v/l/g/geo/ except the gsync template config.

So here I have an issue.

The only error I saw on node C is:

   [2015-01-21 18:36:41.179601] E [rpc-clnt.c:208:call_bail]
   0-management: bailing out frame type(Peer mgmt) op(—(2)) xid =
   0x23 sent = 2015-01-21 18:26:33.031937. timeout = 600 for
   xx.xx.xx.xx:24007

On node A, the cli.log looks like:

   [2015-01-21 18:49:49.878905] I [socket.c:3561:socket_init]
   0-glusterfs: SSL support is NOT enabled
   [2015-01-21 18:49:49.878947] I [socket.c:3576:socket_init]
   0-glusterfs: using system polling thread
   [2015-01-21 18:49:49.879085] I [socket.c:3561:socket_init]
   0-glusterfs: SSL support is NOT enabled
   [2015-01-21 18:49:49.879095] I [socket.c:3576:socket_init]
   0-glusterfs: using system polling thread
   [2015-01-21 18:49:49.951835] I
   [socket.c:2238:socket_event_handler] 0-transport: disconnecting now
   [2015-01-21 18:49:49.972143] I [input.c:36:cli_batch] 0-: Exiting
   with: 0

If I run gluster system:: execute gsec_create on node C or node B, the common pem key file contains my 3 nodes pem puk keys. So in some way node A is unable to get the key from node C.

So let’s try to fix this one before going further.

--
Cyril Peponnet

On Jan 20, 2015, at 9:38 PM, Aravinda <avishwan at redhat.com<mailto:avishwan at redhat.com> <mailto:avishwan at redhat.com>> wrote:

On 01/20/2015 11:01 PM, PEPONNET, Cyril N (Cyril) wrote:
Hi,

I’m ready for new testing, I delete the geo-rep session between master and slace, remove the lines in authorized keys file on slave.
I also remove the common secret pem from slave, and from master. There is only the gsyncd_template.conf in /var/lib/gluster now.

Here is our setup:

Site A: gluster 3 nodes
Site B: gluster 1 node (for now, a second will come).

I can issue

gluster systen:: execute gsec_create

what to check?
common_secret.pem.pub is created in /var/lib/glusterd/geo-replication/common_secret.pem.pub, which should contain public keys from all Master nodes(Site A). Should match with contents of /var/lib/glusterd/geo-replication/secret.pem.pub and /var/lib/glusterd/geo-replication/tar_ssh.pem.pub.

then

gluster geo vol geo_test slave::geo_test create push-pem force (force is needed because the slave vol is smaller than the master vol).

What to check ?
Check for any errors in, /var/log/glusterfs/etc-glusterfs-glusterd.vol.log in rpm installation or in /var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log if source installation. In case of any errors related to hook execution, run directly the hook command copied from the log. From your previous mail I understand their is some issue while executing hook script. I will look into the issue in hook script.

I want to use change_detector changelog and not rsync btw.
change_detector is crawling mecanism. Available options are: changelog and xsync. xsync is FS Crawl.
sync mecanisms available are: rsync and tarssh.

Can you guide me to setup this but also debug why it’s not working out of the box ?

If needed I can get in touch with you through IRC.
Sure. IRC nickname is aravindavk.

Thanks for your help.

--
regards
Aravinda
http://aravindavk.in<http://aravindavk.in/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150202/034dbd88/attachment.html>