[Gluster-users] geo replication, invalid slave name and gluster 3.5.1

Thu Jul 17 12:30:30 UTC 2014

Hello Aravinda,

i changed the configuration file to what you have suggested. The result 
was same though... Interesting thing, that the test files in the gluster 
volume got synced but after the initialization new files created or 
updated will not get pushed to slave anymore, nor the slave status 
change from faulty.

MASTER NODE                  MASTER VOL    MASTER BRICK 
SLAVE                           STATUS    CHECKPOINT STATUS    CRAWL 
STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES 
PENDING    FILES SKIPPED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1.1.1.1    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave faulty    
N/A                  N/A             0 0                0                
0                  0
1.1.1.2    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave faulty    
N/A                  N/A             20004 0                
0                0                  0
1.1.1.3    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave faulty    
N/A                  N/A             0 0                0                
0                  0

second interesting part was the config file itself... the remote_gsyncd 
is mentioned twice there....  the highest one was the nonexistent one

[peersrx . %5Essh%3A]
remote_gsyncd = /nonexistent/gsyncd

[__section_order__]
peersrx . %5essh%3a = 2
peersrx . = 3
peersrx . . = 0

[peersrx .]
gluster_log_file = 
/var/log/glusterfs/geo-replication-slaves/${session_owner}:${eSlave}.gluster.log
gluster_command_dir = /usr/sbin/
log_file = 
/var/log/glusterfs/geo-replication-slaves/${session_owner}:${eSlave}.log
gluster_params = xlator-option=*-dht.assert-no-child-down=true
gluster_command = /usr/sbin/glusterfs --xlator-option 
*-dht.assert-no-child-down=true

[__meta__]
version = 2.0

[peersrx . .]
gluster_log_file = 
/var/log/glusterfs/geo-replication/${mastervol}/${eSlave}.gluster.log
ssh_command = ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no 
-i /var/lib/glusterd/geo-replication/secret.pem
session_owner = 13832fdb-c494-4081-9693-4f953c947fac
remote_gsyncd = /usr/libexec/glusterfs/gsyncd
state_file = /var/lib/glusterd/geo-replication/${mastervol}/${eSlave}.status
gluster_command_dir = /usr/sbin/
pid_file = /var/lib/glusterd/geo-replication/${mastervol}/${eSlave}.pid
log_file = /var/log/glusterfs/geo-replication/${mastervol}/${eSlave}.log
gluster_params = xlator-option=*-dht.assert-no-child-down=true
gluster_command = /usr/sbin/glusterfs --xlator-option 
*-dht.assert-no-child-down=true

In the slave in the log file i can see

[2014-07-17 12:26:00.767368] I 
[glusterd-handler.c:2501:__glusterd_handle_getwd] 0-glusterd: Received 
getwd req
[2014-07-17 12:26:03.377238] I 
[glusterd-handler.c:2501:__glusterd_handle_getwd] 0-glusterd: Received 
getwd req
[2014-07-17 12:26:13.996132] I 
[glusterd-handler.c:2501:__glusterd_handle_getwd] 0-glusterd: Received 
getwd req
[2014-07-17 12:26:14.910653] I 
[glusterd-handler.c:2501:__glusterd_handle_getwd] 0-glusterd: Received 
getwd req
[2014-07-17 12:26:17.736239] I 
[glusterd-handler.c:2501:__glusterd_handle_getwd] 0-glusterd: Received 
getwd req

in the master

[2014-07-17 12:28:03.977266] I 
[glusterd-geo-rep.c:1767:glusterd_get_statefile_name] 0-: Using passed 
config 
template(/var/lib/glusterd/geo-replication/repository_84.45.11.80_repo_replicator/gsyncd.conf).
[2014-07-17 12:28:05.875462] I 
[glusterd-handler.c:1169:__glusterd_handle_cli_get_volume] 0-glusterd: 
Received get vol req

Thank you and best regards,
Stefan

On 17/07/14 13:18, Aravinda wrote:
> On 07/16/2014 02:20 PM, Stefan Moravcik wrote:
>> Hello Vishwanath,
>>
>> thanks for pointing me to right direction... This was helpful... I 
>> thought the password less ssh connection was done from glusterfs 
>> using the secret.pem in the initial run.. But wasn't.. I had to 
>> create the id_rsa in the /root/.ssh/ directory to be able to ssh to 
>> slave without any -i option...
>>
>> Great, thanks for that... However i have additional question... Again 
>> little bit different to the previous ones... This seems like a bug to 
>> me.. But you for sure will know better.
>>
>> After I created the geo-replication volume and i started it.. 
>> everything looked Ok and successful. Then i looked in the status 
>> command and got this
>>
>> MASTER NODE                  MASTER VOL    MASTER BRICK 
>> SLAVE                           STATUS    CHECKPOINT STATUS CRAWL STATUS
>> ---------------------------------------------------------------------------------------------------------------------------------------------
>> 1.1.1.1    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A
>> 1.1.1.2    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A
>> 1.1.1.3    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A
>>
>> when i checked the config file there is
>>
>> remote_gsyncd: /nonexistent/gsyncd
>> i even tried to create symlinks for this but faulty status has never 
>> gone away... Found a bug report on bugzilla 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1105283
> Update conf file manually as following and stop and start the 
> geo-replication.(Conf file location: 
> /var/lib/glusterd/geo-replication/<MASTER VOL>_<SLAVE IP_SLAVE 
> VOL>/gsyncd.conf)
> remote_gsyncd = /usr/libexec/glusterfs/gsyncd
>
> Let us know if this resolves the issue.
>
>
> --
> regards
> Aravinda
> http://aravindavk.in
>
>>
>> [2014-07-16 07:14:34.718718] E 
>> [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-: Unable 
>> to read gsyncd status file
>> [2014-07-16 07:14:34.718756] E 
>> [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable to 
>> read the statusfile for /shared/myvol1 brick for repository(master), 
>> 1.2.3.4::myvol1_slave(slave) session
>>
>> However since the symlink is in place the error message above won't 
>> show in the log.. Actually there are no more error logs just faulty 
>> status...
>>
>> Even more interesting. When i changed the configuration from rsync to 
>> tar+ssh it synced the files there, but will not replicate any changes 
>> or new files created....
>>
>> MASTER NODE                  MASTER VOL    MASTER BRICK 
>> SLAVE                           STATUS    CHECKPOINT STATUS CRAWL 
>> STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING DELETES 
>> PENDING    FILES SKIPPED
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 1.1.1.1    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A             10001 0                
>> 0                0 0
>> 1.1.1.2    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A             0 0                
>> 0                0 0
>> 1.1.1.3    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave 
>> faulty    N/A                  N/A             0 0                
>> 0                0                  0
>>
>>
>> as you can see, 10001 files replicated... but if i create a new one 
>> or edit the existing ones, the faulty status will not replicate 
>> anymore.. This is true even if i change back from tar+ssh to rsync or 
>> restart glusterd or anything...
>>
>> Thank you for all your help, much appreciated
>>
>> Regards,
>> Stefan
>>
>> On 15/07/14 17:15, M S Vishwanath Bhat wrote:
>>> On 15/07/14 18:13, Stefan Moravcik wrote:
>>>> Hello Vishwanath
>>>>
>>>> thank you for your quick reply but i have a follow up question if 
>>>> it is ok... Maybe a different issue and i should open a new thread, 
>>>> but i will try to continue to use this one...
>>>>
>>>> So I followed the new documentation... let me show you what i have 
>>>> done and what is the final error message...
>>>>
>>>>
>>>> I have 3 servers node1, node2 and node3 with IPs 1.1.1.1, 1.1.1.2 
>>>> and 1.1.1.3
>>>>
>>>> I installed glusterfs-server and glusterfs-geo-replication on all 3 
>>>> of them... I created replica volume called myvol1 and run the command
>>>>
>>>> gluster system:: execute gsec_create
>>>>
>>>> this created 4 files:
>>>> secret.pem
>>>> secret.pem.pub
>>>> tar_ssh.pem
>>>> tar_ssh.pem.pub
>>>>
>>>> The pub file is different on all 3 nodes so I copied all 3 
>>>> secret.pem.pub to slave authorized_keys. I tried to ssh directly to 
>>>> slave server from all 3 nodes and got through with no problem.
>>>>
>>>> So I connected to slave server installed glusterfs-server and 
>>>> glusterfs-geo-replication there too.
>>>>
>>>> Started the glusterd and created a volume called myvol1_slave
>>>>
>>>> Then I peer probed one of the masters with slave. This showed the 
>>>> volume in my master and peer appeared in peer status.
>>>>
>>>> From here i run the command in your documentation
>>>>
>>>> volume geo-replication myvol1 1.2.3.4::myvol1_slave create push-pem
>>>> Passwordless ssh login has not been setup with 1.2.3.4.
>>>> geo-replication command failed
>>> Couple of things here.
>>>
>>> I believe it was not clear enough in the docs and I apologise for 
>>> that. But this is the prerequisite for dist-geo-rep.
>>>
>>> * /There should be a password-less ssh setup between at least one 
>>> node in master volume to one node in slave volume. The geo-rep 
>>> create command should be executed from this node which has 
>>> password-less ssh setup to slave./
>>>
>>> So in your case, you can setup a password less ssh between 1.1.1.1 
>>> (one master volume node) to 1.2.3.4 (one slave volume node). You can 
>>> use "ssh-keygen" and "ssh-copy-id" to do the same.
>>> After the above step is done, execute the "gluster system:: execute 
>>> gsec_create". You don't need to copy it to the slave autorized_keys. 
>>> geo-rep create push-pem takes care of it for you.
>>>
>>> Now, you should execute "gluster volume geo-rep myvol1 
>>> 1.2.3.4::myvol1_slave cerate push-pem" from 1.1.1.1 (because this 
>>> node has passwordless ssh to 1.2.3.4 mentioned in the command)
>>>
>>> That should create a geo-rep session for you. That can be started 
>>> later on.
>>>
>>> And you don't need to peer probe slave from master or vice versa. 
>>> Logically both master and slave volumes are in different clusters 
>>> (in two different geographic locations).
>>>
>>> HTH,
>>> Vishwanath
>>>
>>>>
>>>> In the secure log file i could see the connection though.
>>>>
>>>> 2014-07-15T13:26:56.083445+01:00 1testlab sshd[23905]: Set 
>>>> /proc/self/oom_score_adj to 0
>>>> 2014-07-15T13:26:56.089423+01:00 1testlab sshd[23905]: Connection 
>>>> from 1.1.1.1 port 58351
>>>> 2014-07-15T13:26:56.248687+01:00 1testlab sshd[23906]: Connection 
>>>> closed by 1.1.1.1
>>>>
>>>> and in the logs of one of the masters
>>>>
>>>> [2014-07-15 12:26:56.247667] E 
>>>> [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave
>>>> [2014-07-15 12:26:56.247752] E 
>>>> [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-: 
>>>> 1.2.3.4::myvol1_slave is not a valid slave volume. Error: 
>>>> Passwordless ssh login has not been setup with 1.2.3.4.
>>>> [2014-07-15 12:26:56.247772] E 
>>>> [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of 
>>>> operation 'Volume Geo-replication Create' failed on localhost : 
>>>> Passwordless ssh login has not been setup with 1.2.3.4.
>>>>
>>>> there is no log in the other masters in the cluster nor on slave..
>>>>
>>>> I even tried with force option, but same result... I disabled 
>>>> firewall and selinux just to make sure those parts of the system do 
>>>> not interfere. Searched a google for same problem and found one... 
>>>> http://irclog.perlgeek.de/gluster/2014-01-16 but again no answer or 
>>>> solution.
>>>>
>>>> Thank you for your time and help.
>>>>
>>>> Best regards,
>>>> Stefan
>>>>
>>>> On 15/07/14 12:26, M S Vishwanath Bhat wrote:
>>>>> On 15/07/14 15:08, Stefan Moravcik wrote:
>>>>>> Hello Guys,
>>>>>>
>>>>>> I have been trying to set a geo replication in our glusterfs test 
>>>>>> environment and got a problem with a message "invalid slave name"
>>>>>>
>>>>>> So first things first...
>>>>>>
>>>>>> I have 3 nodes configured in a cluster. Those nodes are 
>>>>>> configured as replica. On this cluster I have a volume created 
>>>>>> with let say name myvol1. So far everything works and looks good...
>>>>>>
>>>>>> Next step was to create a geo replication off site.. So i 
>>>>>> followed this documentation:
>>>>>> http://www.gluster.org/community/documentation/index.php/HowTo:geo-replication 
>>>>>>
>>>>> These are old docs. I have edited this to mention that it is old 
>>>>> geo-rep docs.
>>>>>
>>>>> Please refer to 
>>>>> https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md 
>>>>> or 
>>>>> https://medium.com/@msvbhat/distributed-geo-replication-in-glusterfs-ec95f4393c50 
>>>>> for latest distributed-geo-rep documentation.
>>>>>>
>>>>>> I had peered the slave server, created secret.pem was able to ssh 
>>>>>> without the password and tried to create the geo replication 
>>>>>> volume with the code from the documentation and got the following 
>>>>>> error:
>>>>>>
>>>>>> on master:
>>>>>> gluster volume geo-replication myvol1 
>>>>>> 1.2.3.4:/shared/myvol1_slave start
>>>>>>
>>>>>> on master:
>>>>>> [2014-07-15 09:15:37.188701] E 
>>>>>> [glusterd-geo-rep.c:4083:glusterd_get_slave_info] 0-: Invalid 
>>>>>> slave name
>>>>>> [2014-07-15 09:15:37.188827] W [dict.c:778:str_to_data] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1/xlator/mgmt/glusterd.so(glusterd_op_stage_gsync_create+0x1e2) 
>>>>>> [0x7f979e20f1f2] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1/xlator/mgmt/glusterd.so(glusterd_get_slave_details_confpath+0x116) 
>>>>>> [0x7f979e20a306] 
>>>>>> (-->/usr/lib64/libglusterfs.so.0(dict_set_str+0x1c) 
>>>>>> [0x7f97a322045c]))) 0-dict: value is NULL
>>>>>> [2014-07-15 09:15:37.188837] E 
>>>>>> [glusterd-geo-rep.c:3995:glusterd_get_slave_details_confpath] 0-: 
>>>>>> Unable to store slave volume name.
>>>>>> [2014-07-15 09:15:37.188849] E 
>>>>>> [glusterd-geo-rep.c:2056:glusterd_op_stage_gsync_create] 0-: 
>>>>>> Unable to fetch slave or confpath details.
>>>>>> [2014-07-15 09:15:37.188861] E 
>>>>>> [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging 
>>>>>> of operation 'Volume Geo-replication Create' failed on localhost
>>>>>>
>>>>>> there are no logs on slave what so ever
>>>>>> I also tried different documentation with "create push-pem" got 
>>>>>> the very same problem as above...
>>>>>>
>>>>>> I tried to start the volume as node:/path/to/dir and also created 
>>>>>> a volume on slave and started as node:/slave_volume_name always a 
>>>>>> same result...
>>>>>>
>>>>>> Tried to search for a solution and found this 
>>>>>> http://fpaste.org/114290/04117421/
>>>>>>
>>>>>> It was different user with a very same problem... The issue was 
>>>>>> shown on IRC channel, but never answered..
>>>>>>
>>>>>> This is a fresh install of 3.5.1, so no upgrade should be needed 
>>>>>> i guess... Any help solving this problem would be appreciated..
>>>>> From what you have described, it looks like your slave is not a 
>>>>> gluster volume. In latest geo-rep, slave has to be a gluster 
>>>>> volume. Now glusterfs does not support a simple directory as a slave.
>>>>>
>>>>> Please follow new documentation and try once more.
>>>>>
>>>>> HTH
>>>>>
>>>>> Best Regards,
>>>>> Vishwanath
>>>>>
>>>>>>
>>>>>> Thank you and best regards,
>>>>>> Stefan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> ***************************************************************************************************************************************************************************
>> This email and any files transmitted with it are confidential and 
>> intended solely for the use of the individual or entity to whom they 
>> are addressed.
>> If you have received this email in error please reply to the sender 
>> indicating that fact and delete the copy you received.
>> In addition, if you are not the intended recipient, you should not 
>> print, copy, retransmit, disseminate, or otherwise use the information
>> contained in this communication. Thank you.
>>
>> Newsweaver is a Trade Mark of E-Search Ltd. Registered in Ireland No. 
>> 254994.
>> Registered Office: 2200 Airport Business Park, Kinsale Road, Cork, 
>> Ireland. International Telephone Number: +353 21 2427277.
>> ***************************************************************************************************************************************************************************
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
*******************************************************************************************************************************************************
********************
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please reply to the sender 
indicating that fact and delete the copy you received.
In addition, if you are not the intended recipient, you should not print, 
copy, retransmit, disseminate, or otherwise use the information
contained in this communication. Thank you.

Newsweaver is a Trade Mark of E-Search Ltd. Registered in Ireland No. 
254994.
Registered Office: 2200 Airport Business Park, Kinsale Road, Cork, Ireland. 
International Telephone Number: +353 21 2427277.
*******************************************************************************************************************************************************
********************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140717/3c602a7a/attachment.html>