[Gluster-users] Geo-replication

Tue Mar 3 05:08:31 UTC 2020

On March 3, 2020 4:13:38 AM GMT+02:00, David Cunningham <dcunningham at voisonics.com> wrote:
>Hello,
>
>Thanks for that. When we re-tried with push-pem from cafs10 (on the
>A/master cluster) it failed with "Unable to mount and fetch slave
>volume
>details." and in the logs we see:
>
>[2020-03-03 02:07:42.614911] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-0: DNS
>resolution failed on host nvfs10.local
>[2020-03-03 02:07:42.638824] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-1: DNS
>resolution failed on host nvfs20.local
>[2020-03-03 02:07:42.664493] E
>[name.c:258:af_inet_client_get_remote_sockaddr] 0-gvol0-client-2: DNS
>resolution failed on host nvfs30.local
>
>These .local addresses are the LAN addresses that B/slave nodes nvfs10,
>nvfs20, and nvfs30 replicate with. It seems that the A/master needs to
>be
>able to contact those addresses. Is that right? If it is then we'll
>need to
>re-do the B cluster to replicate using publicly accessible IP addresses
>instead of their LAN.
>
>Thank you.
>
>
>On Mon, 2 Mar 2020 at 20:53, Aravinda VK <aravinda at kadalu.io> wrote:
>
>> Looks like setup issue to me. Copying SSH keys manually is not
>required.
>>
>> Command prefix is required while adding to authorized_keys file in
>each
>> remote nodes. That will not be available if ssh keys are added
>manually.
>>
>> Geo-rep specifies /nonexisting/gsyncd in the command to make sure it
>> connects via the actual command specified in authorized_keys file, in
>your
>> case Geo-replication is actually looking for gsyncd command in
>> /nonexisting/gsyncd path.
>>
>> Please try with push-pem option during Geo-rep create command.
>>
>> —
>> regards
>> Aravinda Vishwanathapura
>> https://kadalu.io
>>
>>
>> On 02-Mar-2020, at 6:03 AM, David Cunningham
><dcunningham at voisonics.com>
>> wrote:
>>
>> Hello,
>>
>> We've set up geo-replication but it isn't actually syncing. Scenario
>is
>> that we have two GFS clusters. Cluster A has nodes cafs10, cafs20,
>and
>> cafs30, replicating with each other over a LAN. Cluster B has nodes
>nvfs10,
>> nvfs20, and nvfs30 also replicating with each other over a LAN. We
>are
>> geo-replicating data from the A cluster to the B cluster over the
>internet.
>> SSH key access is set up, allowing all the A nodes password-less
>access to
>> root on nvfs10
>>
>> Geo-replication was set up using these commands, run on cafs10:
>>
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 create
>> ssh-port 8822 no-verify
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 config
>> remote-gsyncd /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
>> gluster volume geo-replication gvol0 nvfs10.example.com::gvol0 start
>>
>> However after a very short period of the status being
>"Initializing..."
>> the status then sits on "Passive":
>>
>> # gluster volume geo-replication gvol0 nvfs10.example.com::gvol0
>status
>> MASTER NODE    MASTER VOL    MASTER BRICK                       
>SLAVE
>> USER    SLAVE                         SLAVE NODE      STATUS    
>CRAWL
>> STATUS    LAST_SYNCED
>>
>>
>------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> cafs10         gvol0         /nodirectwritedata/gluster/gvol0    root
>>      nvfs10.example.com::gvol0    nvfs30.local    Passive    N/A
>>     N/A
>> cafs30         gvol0         /nodirectwritedata/gluster/gvol0    root
>>      nvfs10.example.com::gvol0    N/A             Created    N/A
>>     N/A
>> cafs20         gvol0         /nodirectwritedata/gluster/gvol0    root
>>      nvfs10.example.com::gvol0    N/A             Created    N/A
>>     N/A
>>
>> So my questions are:
>> 1. Why does the status on cafs10 mention "nvfs30.local"? That's the
>LAN
>> address that nvfs10 replicates with nvfs30 using. It's not accessible
>from
>> the A cluster, and I didn't use it when configuring geo-replication.
>> 2. Why does geo-replication sit in Passive status?
>>
>> Thanks very much for any assistance.
>>
>>
>> On Tue, 25 Feb 2020 at 15:46, David Cunningham
><dcunningham at voisonics.com>
>> wrote:
>>
>>> Hi Aravinda and Sunny,
>>>
>>> Thank you for the replies. We have 3 replicating nodes on the master
>>> side, and want to geo-replicate their data to the remote slave side.
>As I
>>> understand it if the master node which had the geo-replication
>create
>>> command run goes down then another node will take over pushing
>updates to
>>> the remote slave. Is that right?
>>>
>>> We have already taken care of adding all master node's SSH keys to
>the
>>> remote slave's authorized_keys externally, so won't include the
>push-pem
>>> part of the create command.
>>>
>>> Mostly I wanted to confirm the geo-replication behaviour on the
>>> replicating master nodes if one of them goes down.
>>>
>>> Thank you!
>>>
>>>
>>> On Tue, 25 Feb 2020 at 14:32, Aravinda VK <aravinda at kadalu.io>
>wrote:
>>>
>>>> Hi David,
>>>>
>>>>
>>>> On 25-Feb-2020, at 3:45 AM, David Cunningham
><dcunningham at voisonics.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've a couple of questions on geo-replication that hopefully
>someone can
>>>> help with:
>>>>
>>>> 1. If there are multiple nodes in a cluster on the master side
>(pushing
>>>> updates to the geo-replication slave), which node actually does the
>>>> pushing? Does GlusterFS decide itself automatically?
>>>>
>>>>
>>>> Once Geo-replication session is started, one worker will be started
>>>> corresponding to each Master bricks. Each worker identifies the
>changes
>>>> happened in respective brick and sync those changes via Mount. This
>way
>>>> load is distributed among Master nodes. In case of Replica sub
>volume, one
>>>> worker among the Replica group will become active and participate
>in the
>>>> syncing. Other bricks in that Replica group will remain Passive.
>Passive
>>>> worker will become Active if the previously Active brick goes down
>(This is
>>>> because all Replica bricks will have the same set of changes,
>syncing from
>>>> each worker is redundant).
>>>>
>>>>
>>>> 2.With regard to copying SSH keys, presumably the SSH key of all
>master
>>>> nodes should be authorized on the geo-replication client side?
>>>>
>>>>
>>>> Geo-replication session is established between one master node and
>one
>>>> remote node. If Geo-rep create command is successful then,
>>>>
>>>> - SSH keys generated in all master nodes
>>>> - Public keys from all master nodes are copied to initiator Master
>node
>>>> - Public keys copied to the Remote node specified in the create
>command
>>>> - Master public keys are distributed to all nodes of remote Cluster
>and
>>>> added to respective ~/.ssh/authorized_keys
>>>>
>>>> After successful Geo-rep create command, any Master node can
>connect to
>>>> any remote node via ssh.
>>>>
>>>> Security: Command prefix is added while adding public key to remote
>>>> node’s authorized_keys file, So that if anyone gain access using
>this key
>>>> can access only gsyncd command.
>>>>
>>>> ```
>>>> command=gsyncd ssh-key….
>>>> ```
>>>>
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>> ________
>>>>
>>>>
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> Schedule -
>>>> Every Tuesday at 14:30 IST / 09:00 UTC
>>>> Bridge: https://bluejeans.com/441850968
>>>>
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>> —
>>>> regards
>>>> Aravinda Vishwanathapura
>>>> https://kadalu.io
>>>>
>>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>>
>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>>
>>

Hey David,

Why don't you set the B cluster's hostnames in /etc/hosts of all A cluster nodes ?

Maybe you won't need to rebuild  the whole B cluster.

I guess the A cluster nodes nees to be able to reach all nodes from B cluster, so you might need to change the firewall settings.

Best Regards,
Strahil Nikolov