[Gluster-users] 答复: : geo-replication status partial faulty

Mon May 30 09:38:04 UTC 2016

Comments inline

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
> Murthy" <avishwan at redhat.com>
> Sent: Thursday, May 26, 2016 7:04:30 AM
> Subject: 答复: [Gluster-users] : geo-replication status partial faulty
> 
> I retried many times, find that when I set slave volume's bricks or nodes
> below 6, the geo-replication volume status is OK.
> I am not sure if this is a bug.
     This should not be so. It should not depend on the number of bricks on slave volume side.
  I will try to reproduce and check.
> 
> 
> Whether Normal or faulty nodes, test result is the same.
> 
> [root at SVR8049HW2285 ~]#  bash -x /usr/libexec/glusterfs/gverify.sh filews
> root glusterfs02.sh3.ctripcorp.com filews_slave  "/tmp/gverify.log"
> + BUFFER_SIZE=104857600
> ++ gluster --print-logdir
> + slave_log_file=/var/log/glusterfs/geo-replication-slaves/slave.log
> + main filews root glusterfs02.sh3.ctripcorp.com filews_slave
> /tmp/gverify.log
> + log_file=/tmp/gverify.log
> + SSH_PORT=22
> + ping_host glusterfs02.sh3.ctripcorp.com 22
> + '[' 0 -ne 0 ']'
> + ssh -oNumberOfPasswordPrompts=0 root at glusterfs02.sh3.ctripcorp.com 'echo
> Testing_Passwordless_SSH'
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
> + '[' 255 -ne 0 ']'
> + echo 'FORCE_BLOCKER|Passwordless ssh login has not been setup with
> glusterfs02.sh3.ctripcorp.com for user root.'
> + exit 1
> [root at SVR8049HW2285 ~]#
> 

    This means you need to setup 'password less ssh' to 'glusterfs02.sh3.ctripcorp.com'
from any one master node and follow geo-rep creation steps as said in previous mail.

With below 6 nodes/brick at slave volume, the geo-rep session is running fine for you?
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> OPS
> Ctrip Infrastructure Service (CIS)
> Ctrip Computer Technology (Shanghai) Co., Ltd
> Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> Web: www.Ctrip.com
> 
> 
> -----邮件原件-----
> 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> 发送时间: Wednesday, May 25, 2016 4:58 PM
> 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>
> 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>; Gluster-users at gluster.org;
> Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>
> 主题: Re: [Gluster-users] : geo-replication status partial faulty
> 
> Answers inline
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > <avishwan at redhat.com>
> > Sent: Wednesday, May 25, 2016 12:34:12 PM
> > Subject: [Gluster-users] : geo-replication status partial faulty
> > 
> > Hi,
> > 
> > 
> > Verify below before proceeding further.
> > 
> > 1. There is only one session directory in all master nodes.
> >    
> >     ls -l /var/lib/glusterd/geo-replication/
> > 
> > 2. I can find  "*.status" file in those nodes that geo-replication
> > status shows active or passive, but there is no "*.status" file when
> > node status is faulty
> > 
> > Per your instruction to clean up ssh keys and do a fresh setup,  step
> > 3 failed
> > 
> > 3. Create georep ssh keys again and do create force.
> >    gluster system:: exec gsec_create
> >    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create
> >    push-pem force
> > 
> > [root at SVR8048HW2285 glusterfs]# gluster volume geo-replication filews
> > glusterfs02.sh3.ctripcorp.com::filews_slave create push-pem force
> > Unable to fetch slave volume details. Please check the slave cluster
> > and slave volume.
> > geo-replication command failed
> 
>        Then please check the slave cluster status whether it is running fine
>   and glusterd is running on all slave nodes. After fixing slave cluster if
>   any issues.
>   Please check whether the below script runs fine.
> 
>   bash -x /usr/libexec/glusterfs/gverify.sh <master_vol_name> root
>   glusterfs02.sh3.ctripcorp.com::filews_slave <slave_vol>
>   "/tmp/gverify.log"
>   
> 
> > [root at SVR8048HW2285 glusterfs]#
> > [root at SVR8048HW2285 glusterfs]# ssh -i
> > /var/lib/glusterd/geo-replication/secret.pem
> > root at glusterfs02.sh3.ctripcorp.com
> > Last login: Wed May 25 14:33:15 2016 from 10.8.231.11 This is a
> > private network server, in monitoring state.
> > It is strictly prohibited to unauthorized access and used.
> > [root at SVR6520HW2285 ~]#
> > 
> > etc-glusterfs-glusterd.vol.log loged following message
> > 
> > [2016-05-25 06:47:47.698364] E
> > [glusterd-geo-rep.c:2012:glusterd_verify_slave] 0-: Not a valid slave
> > [2016-05-25 06:47:47.698433] E
> > [glusterd-geo-rep.c:2240:glusterd_op_stage_gsync_create] 0-:
> > glusterfs02.sh3.ctripcorp.com::filews_slave is not a valid slave volume.
> > Error: Unable to fetch slave volume details. Please check the slave
> > cluster and slave volume.
> > [2016-05-25 06:47:47.698451] E
> > [glusterd-syncop.c:1201:gd_stage_op_phase]
> > 0-management: Staging of operation 'Volume Geo-replication Create'
> > failed on localhost : Unable to fetch slave volume details. Please
> > check the slave cluster and slave volume.
> > 
> > 
> > 
> > 
> > 
> > 
> > Best Regards
> > 杨雨阳 Yuyang Yang
> > 
> > 
> > -----邮件原件-----
> > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > 发送时间: Wednesday, May 25, 2016 2:06 PM
> > 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>
> > 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy
> > <avishwan at redhat.com>
> > 主题: Re: geo-replication status partial faulty
> > 
> > Hi,
> > 
> > Verify below before proceeding further.
> > 
> > 1. Run the following command in all the master nodes and
> >    You should find only one directory (session directory)
> >    and rest all are files. If you find two directories, it
> >    needs a clean up in all master nodes to have the same
> >    session directory in all master nodes.
> >    
> >     ls -l /var/lib/glusterd/geo-replication/
> > 
> > 2. Run the following command in all master nodes and you should
> >    find "*.status" file in all of them.
> > 
> >     ls -l /var/lib/glusterd/geo-replication/<session_directory>
> > 
> > 
> > Follow the below steps to clean up ssh keys and do a fresh setup.
> > 
> > In all the slave nodes, clean up ssh keys prefixed with
> > command=...gsyncd and command=tar.. in /root/.ssh/authorized_keys.
> > Also cleanup id_rsa.pub if you had copied form secret.pem and setup
> > usual passwordless ssh connection using ssh-copy-id
> > 
> > 1. Establish passwordless SSH between one of master node and one of
> > slave node.
> >    (not required to copy secret.pem use the usual ssh-copy-id way)
> >    Remember to run all geo-rep commands on same master node and use the
> >    same
> >    slave node for geo-rep commands.
> > 
> > 2. Stop and Delete geo-rep session as follows.
> >    gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> stop
> >    gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> delete
> > 
> > 3. Create georep ssh keys again and do create force.
> >    gluster system:: exec gsec_create
> >    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create
> >    push-pem force
> > 
> > 4. Verify keys have been distributed properly. The below command
> > should automatically
> >    run the gsycnd.py without asking password from any master node to any
> >    slave host.
> > 
> >    ssh -i /var/lib/glusterd/geo-replication/secret.pem
> > root@<slave-host>
> > 
> > 4. Start geo-rep
> >    gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> start
> > 
> > Let me know if you still face issues.
> > 
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > 
> > 
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > ----- Original Message -----
> > > From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > <avishwan at redhat.com>
> > > Sent: Wednesday, May 25, 2016 7:11:08 AM
> > > Subject: 答复: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication
> > > status partial faulty
> > > 
> > > Commands output as following, Thanks
> > > 
> > > [root at SVR8048HW2285 ~]# gluster volume geo-replication filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > >  
> > > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > > STATUS     CHECKPOINT STATUS
> > > CRAWL STATUS
> > > -------------------------------------------------------------------------------------------------------------------------------------------------
> > > SVR8048HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5954      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5951      filews        /export/sdb/brick1
> > > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR8050HW2285    filews        /export/sdb/filews
> > > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR8049HW2285    filews        /export/sdb/filews
> > > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SVR8047HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > Hybrid Crawl
> > > SVR6995HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6993HW2285    filews        /export/sdb/filews
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5953      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SH02SVR5952      filews        /export/sdb/brick1
> > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > N/A
> > > SVR6996HW2285    filews        /export/sdb/filews
> > > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > > SVR6994HW2285    filews        /export/sdb/filews
> > > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > N/A
> > >            
> > > [root at SVR8048HW2285 ~]# ls -l /var/lib/glusterd/geo-replication/
> > > total 40
> > > -rw------- 1 root root 14140 May 20 16:00 common_secret.pem.pub
> > > drwxr-xr-x 2 root root  4096 May 25 09:35
> > > filews_glusterfs01.sh3.ctripcorp.com_filews_slave
> > > -rwxr-xr-x 1 root root  1845 May 17 15:04 gsyncd_template.conf
> > > -rw------- 1 root root  1675 May 20 11:03 secret.pem
> > > -rw-r--r-- 1 root root   400 May 20 11:03 secret.pem.pub
> > > -rw------- 1 root root  1675 May 20 16:00 tar_ssh.pem
> > > -rw-r--r-- 1 root root   400 May 20 16:00 tar_ssh.pem.pub
> > > [root at SVR8048HW2285 ~]#
> > > 
> > > 
> > > 
> > > Best Regards
> > > 杨雨阳 Yuyang Yang
> > > OPS
> > > Ctrip Infrastructure Service (CIS)
> > > Ctrip Computer Technology (Shanghai) Co., Ltd
> > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > Web: www.Ctrip.com
> > > 
> > > 
> > > -----邮件原件-----
> > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > > 发送时间: Tuesday, May 24, 2016 6:41 PM
> > > 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>
> > > 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy
> > > <avishwan at redhat.com>
> > > 主题: Re: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > partial faulty
> > > 
> > > Ok, it looks like there is a problem with ssh key distribution.
> > > 
> > > Before I suggest to clean those up and do setup again, could you
> > > share the output of following commands
> > > 
> > > 1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls
> > > -l /var/lib/glusterd/geo-replication/
> > > 
> > > Is there multiple geo-rep sessions from this master volume or only one?
> > > 
> > > Thanks and Regards,
> > > Kotresh H R
> > > 
> > > ----- Original Message -----
> > > > From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > > > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > > <avishwan at redhat.com>
> > > > Sent: Tuesday, May 24, 2016 3:19:55 PM
> > > > Subject: 答复: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication
> > > > status partial faulty
> > > > 
> > > > We can establish passwordless ssh directly with command 'ssh' ,
> > > > but when create push-pem, it shows ' Passwordless ssh login has
> > > > not been setup '
> > > > unless copy secret.pem to *id_rsa.pub
> > > > 
> > > > [root at SVR8048HW2285 ~]#  ssh -i
> > > > /var/lib/glusterd/geo-replication/secret.pem
> > > > root at glusterfs01.sh3.ctripcorp.com
> > > > Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 This is a
> > > > private network server, in monitoring state.
> > > > It is strictly prohibited to unauthorized access and used.
> > > > [root at SVR6519HW2285 ~]#
> > > > 
> > > > 
> > > > [root at SVR8048HW2285 filews]# gluster volume geo-replication filews
> > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> > > > Passwordless ssh login has not been setup with
> > > > glusterfs01.sh3.ctripcorp.com for user root.
> > > > geo-replication command failed
> > > > [root at SVR8048HW2285 filews]#
> > > > 
> > > > 
> > > > 
> > > > Best Regards
> > > > 杨雨阳 Yuyang Yang
> > > > 
> > > > 
> > > > -----邮件原件-----
> > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > > > 发送时间: Tuesday, May 24, 2016 3:22 PM
> > > > 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>
> > > > 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy
> > > > <avishwan at redhat.com>
> > > > 主题: Re: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > > partial faulty
> > > > 
> > > > Hi
> > > > 
> > > > Could you try following command from corresponding masters to
> > > > faulty slave nodes and share the output?
> > > > The below command should not ask for password and should run gsync.py.
> > > > 
> > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty
> > > > hosts>
> > > > 
> > > > To establish passwordless ssh, it is not necessary to copy
> > > > secret.pem to *id_rsa.pub.
> > > > 
> > > > If the geo-rep session is already established, passwordless ssh
> > > > would already be there.
> > > > My suspect is that when I asked you to do 'create force' you did
> > > > it using another slave where password less ssh was not setup. This
> > > > would create another session directory in
> > > > '/var/lib/glusterd/geo-replication' i.e
> > > > (<master_vol>_<slave_host>_<slave_vol>)
> > > > 
> > > > Please check and let us know.
> > > > 
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > > 
> > > > ----- Original Message -----
> > > > > From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> > > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > > > > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > > > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy"
> > > > > <avishwan at redhat.com>
> > > > > Sent: Friday, May 20, 2016 12:35:58 PM
> > > > > Subject: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > > > partial faulty
> > > > > 
> > > > > Hello, Kotresh
> > > > > 
> > > > > I 'create force', but still some nodes work ,some nodes faulty.
> > > > > 
> > > > > On faulty nodes
> > > > > etc-glusterfs-glusterd.vol.log shown:
> > > > > [2016-05-20 06:27:03.260870] I
> > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using
> > > > > passed config
> > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > [2016-05-20 06:27:03.404544] E
> > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > > Unable to read gsyncd status file
> > > > > [2016-05-20 06:27:03.404583] E
> > > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable
> > > > > to read the statusfile for /export/sdb/brick1 brick for
> > > > > filews(master),
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > > > 
> > > > > 
> > > > > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.
> > > > > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> > > > > shown:
> > > > > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor]
> > > > > Monitor:
> > > > > ------------------------------------------------------------
> > > > > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor]
> > > > > Monitor:
> > > > > starting gsyncd worker
> > > > > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>:
> > > > > rpc_fd:
> > > > > '7,11,10,9'
> > > > > [2016-05-20 15:04:01.987505] I
> > > > > [changelogagent(agent):72:__init__]
> > > > > ChangelogAgent: Agent listining...
> > > > > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop]
> > > > > RepceServer:
> > > > > terminating on reaching EOF.
> > > > > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize]
> > > > > <top>:
> > > > > exiting.
> > > > > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor]
> > > > > Monitor:
> > > > > worker(/export/sdb/brick1) died before establishing connection
> > > > > 
> > > > > Can you help me!
> > > > > 
> > > > > 
> > > > > Best Regards
> > > > > 杨雨阳 Yuyang Yang
> > > > > 
> > > > > 
> > > > > 
> > > > > -----邮件原件-----
> > > > > 发件人: vyyy杨雨阳
> > > > > 发送时间: Thursday, May 19, 2016 7:45 PM
> > > > > 收件人: 'Kotresh Hiremath Ravishankar' <khiremat at redhat.com>
> > > > > 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > > > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna
> > > > > Murthy <avishwan at redhat.com>
> > > > > 主题: 答复: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > > > partial faulty
> > > > > 
> > > > > Still not work.
> > > > > 
> > > > > I need copy /var/lib/glusterd/geo-replication/secret.* to
> > > > > /root/.ssh/id_rsa to make passwordless ssh work.
> > > > > 
> > > > >  I generate /var/lib/glusterd/geo-replication/secret.pem file on
> > > > > every  master nodes.
> > > > > 
> > > > > I am not sure is this right.
> > > > > 
> > > > > 
> > > > > [root at sh02svr5956 ~]# gluster volume geo-replication filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem
> > > > > force Passwordless ssh login has not been setup with
> > > > > glusterfs01.sh3.ctripcorp.com for user root.
> > > > > geo-replication command failed
> > > > > 
> > > > > [root at sh02svr5956 .ssh]# cp
> > > > > /var/lib/glusterd/geo-replication/secret.pem
> > > > > ./id_rsa
> > > > > cp: overwrite `./id_rsa'? y
> > > > > [root at sh02svr5956 .ssh]# cp
> > > > > /var/lib/glusterd/geo-replication/secret.pem.pub
> > > > > ./id_rsa.pub
> > > > > cp: overwrite `./id_rsa.pub'?
> > > > > 
> > > > >  [root at sh02svr5956 ~]# gluster volume geo-replication filews
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem
> > > > > force Creating  geo-replication session between filews &
> > > > > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> > > > > [root at sh02svr5956 ~]#
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Best Regards
> > > > > 杨雨阳 Yuyang Yang
> > > > > OPS
> > > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology
> > > > > (Shanghai) Co., Ltd
> > > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > > > Web: www.Ctrip.com
> > > > > 
> > > > > 
> > > > > -----邮件原件-----
> > > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com]
> > > > > 发送时间: Thursday, May 19, 2016 5:07 PM
> > > > > 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>
> > > > > 抄送: Saravanakumar Arumugam <sarumuga at redhat.com>;
> > > > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna
> > > > > Murthy <avishwan at redhat.com>
> > > > > 主题: Re: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > > > partial faulty
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Could you just try 'create force' once to fix those status file
> > > > > errors?
> > > > > 
> > > > > e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave
> > > > > vol> create push-pem force
> > > > > 
> > > > > Thanks and Regards,
> > > > > Kotresh H R
> > > > > 
> > > > > ----- Original Message -----
> > > > > > From: "vyyy杨雨阳" <yuyangyang at ctrip.com>
> > > > > > To: "Saravanakumar Arumugam" <sarumuga at redhat.com>,
> > > > > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna
> > > > > > Murthy"
> > > > > > <avishwan at redhat.com>, "Kotresh Hiremath Ravishankar"
> > > > > > <khiremat at redhat.com>
> > > > > > Sent: Thursday, May 19, 2016 2:15:34 PM
> > > > > > Subject: 答复: 答复: [Gluster-users] 答复: geo-replication status
> > > > > > partial faulty
> > > > > > 
> > > > > > I have checked all the nodes both on masters and slaves, the
> > > > > > software is the same.
> > > > > > 
> > > > > > I am puzzled why there were half masters work, halt faulty.
> > > > > > 
> > > > > > 
> > > > > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > > > > > glusterfs-api-3.6.3-1.el6.x86_64
> > > > > > glusterfs-fuse-3.6.3-1.el6.x86_64
> > > > > > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > > > > > glusterfs-3.6.3-1.el6.x86_64
> > > > > > glusterfs-cli-3.6.3-1.el6.x86_64
> > > > > > glusterfs-server-3.6.3-1.el6.x86_64
> > > > > > glusterfs-libs-3.6.3-1.el6.x86_64
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Best Regards
> > > > > > 杨雨阳 Yuyang Yang
> > > > > > 
> > > > > > OPS
> > > > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology
> > > > > > (Shanghai) Co., Ltd
> > > > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > > > > > Web: www.Ctrip.com<http://www.ctrip.com/>
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 发件人: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]
> > > > > > 发送时间: Thursday, May 19, 2016 4:33 PM
> > > > > > 收件人: vyyy杨雨阳 <yuyangyang at Ctrip.com>;
> > > > > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna
> > > > > > Murthy <avishwan at redhat.com>; Kotresh Hiremath Ravishankar
> > > > > > <khiremat at redhat.com>
> > > > > > 主题: Re: 答复: [Gluster-users] 答复: geo-replication status partial
> > > > > > faulty
> > > > > > 
> > > > > > Hi,
> > > > > > +geo-rep team.
> > > > > > 
> > > > > > Can you get the gluster version you are using?
> > > > > > 
> > > > > > # For example:
> > > > > > rpm -qa | grep gluster
> > > > > > 
> > > > > > I hope you have same gluster version installed everywhere.
> > > > > > Please double check and share the same.
> > > > > > 
> > > > > > Thanks,
> > > > > > Saravana
> > > > > > On 05/19/2016 01:37 PM, vyyy杨雨阳 wrote:
> > > > > > Hi, Saravana
> > > > > > 
> > > > > > I have changed log level to DEBUG. Then start geo-replication
> > > > > > with log-file option, attached the file.
> > > > > > 
> > > > > > gluster volume geo-replication filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave start
> > > > > > --log-file=geo.log
> > > > > > 
> > > > > > I have checked  /root/.ssh/authorized_keys in
> > > > > > glusterfs01.sh3.ctripcorp.com , It  have entries in
> > > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > > > > > and I have removed the lines not started with “command=”
> > > > > > 
> > > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
> > > > > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no
> > > > > > ssh error.
> > > > > > 
> > > > > > 
> > > > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows
> > > > > > :
> > > > > > 
> > > > > > [2016-05-19 06:39:23.405974] I
> > > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using
> > > > > > passed config
> > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > > [2016-05-19 06:39:23.541169] E
> > > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > > > Unable to read gsyncd status file
> > > > > > [2016-05-19 06:39:23.541210] E
> > > > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable
> > > > > > to read the statusfile for /export/sdb/filews brick for
> > > > > > filews(master),
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > > > > > [2016-05-19 06:39:29.472047] I
> > > > > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-:
> > > > > > Using passed config
> > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > > [2016-05-19 06:39:34.939709] I
> > > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using
> > > > > > passed config
> > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > > > > > [2016-05-19 06:39:35.058520] E
> > > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-:
> > > > > > Unable to read gsyncd status file
> > > > > > 
> > > > > > 
> > > > > > /var/log/glusterfs/geo-replication/filews/
> > > > > > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Af
> > > > > > il
> > > > > > ew
> > > > > > s_
> > > > > > sl
> > > > > > ave.log
> > > > > > shows as following:
> > > > > > 
> > > > > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor]
> > > > > > Monitor:
> > > > > > ------------------------------------------------------------
> > > > > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor]
> > > > > > Monitor:
> > > > > > starting gsyncd worker
> > > > > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>:
> > > > > > rpc_fd:
> > > > > > '7,11,10,9'
> > > > > > [2016-05-19 15:11:37.423882] I
> > > > > > [changelogagent(agent):72:__init__]
> > > > > > ChangelogAgent: Agent listining...
> > > > > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor]
> > > > > > Monitor:
> > > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop]
> > > > > > RepceServer:
> > > > > > terminating on reaching EOF.
> > > > > > [2016-05-19 15:11:37.424335] I
> > > > > > [syncdutils(agent):214:finalize]
> > > > > > <top>:
> > > > > > exiting.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Best Regards
> > > > > > Yuyang Yang
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 发 件人: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]and
> > > > > > share what's the output?
> > > > > > 发送时间: Thursday, May 19, 2016 1:59 PM
> > > > > > 收件人: vyyy杨雨阳
> > > > > > <yuyangyang at Ctrip.com><mailto:yuyangyang at Ctrip.com>;
> > > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
> > > > > > 主题: Re: [Gluster-users] 答复: geo-replication status partial
> > > > > > faulty
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com
> > > > > > slave node.
> > > > > > Can you share the complete logs ?
> > > > > > 
> > > > > > You can increase verbosity of debug messages like this:
> > > > > > gluster volume geo-replication <master volume> <slave
> > > > > > host>::<slave
> > > > > > volume> config log-level DEBUG
> > > > > > 
> > > > > > 
> > > > > > Also, check  /root/.ssh/authorized_keys in
> > > > > > glusterfs01.sh3.ctripcorp.com It should have entries in
> > > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub
> > > > > > (present in master node).
> > > > > > 
> > > > > > Have a look at this one for example:
> > > > > > https://www.gluster.org/pipermail/gluster-users/2015-August/02
> > > > > > 31
> > > > > > 74
> > > > > > .h
> > > > > > tm
> > > > > > l
> > > > > > 
> > > > > > Thanks,
> > > > > > Saravana
> > > > > > On 05/19/2016 07:53 AM, vyyy杨雨阳 wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > I have tried to config a geo-replication volume , all the
> > > > > > master nodes configuration are the same, When I start this
> > > > > > volume, the status shows partial faulty as following:
> > > > > > 
> > > > > > gluster volume geo-replication filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > > > > > 
> > > > > > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > > > > > STATUS     CHECKPOINT STATUS
> > > > > > CRAWL STATUS
> > > > > > -------------------------------------------------------------------------------------------------------------------------------------------------
> > > > > > SVR8048HW2285    filews        /export/sdb/filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SVR8050HW2285    filews        /export/sdb/filews
> > > > > > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > > N/A
> > > > > > SVR8047HW2285    filews        /export/sdb/filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > > > > Hybrid Crawl
> > > > > > SVR8049HW2285    filews        /export/sdb/filews
> > > > > > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > > > > > Hybrid Crawl
> > > > > > SH02SVR5951      filews        /export/sdb/brick1
> > > > > > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > > N/A
> > > > > > SH02SVR5953      filews        /export/sdb/brick1
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SVR6995HW2285    filews        /export/sdb/filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SH02SVR5954      filews        /export/sdb/brick1
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SVR6994HW2285    filews        /export/sdb/filews
> > > > > > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > > N/A
> > > > > > SVR6993HW2285    filews        /export/sdb/filews
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SH02SVR5952      filews        /export/sdb/brick1
> > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > > > > > N/A
> > > > > > SVR6996HW2285    filews        /export/sdb/filews
> > > > > > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > > > > > N/A
> > > > > > 
> > > > > > On the faulty node, log file
> > > > > > /var/log/glusterfs/geo-replication/filews
> > > > > > shows
> > > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > > 
> > > > > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor]
> > > > > > Monitor:
> > > > > > ------------------------------------------------------------
> > > > > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor]
> > > > > > Monitor:
> > > > > > starting gsyncd worker
> > > > > > [2016-05-18 16:55:46.517460] I
> > > > > > [changelogagent(agent):72:__init__]
> > > > > > ChangelogAgent: Agent listining...
> > > > > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop]
> > > > > > RepceServer:
> > > > > > terminating on reaching EOF.
> > > > > > [2016-05-18 16:55:46.518279] I
> > > > > > [syncdutils(agent):214:finalize]
> > > > > > <top>:
> > > > > > exiting.
> > > > > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor]
> > > > > > Monitor:
> > > > > > worker(/export/sdb/filews) died before establishing connection
> > > > > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor]
> > > > > > Monitor:
> > > > > > ------------------------------------------------------------
> > > > > > 
> > > > > > Any advice and suggestions will be greatly appreciated.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Best Regards
> > > > > >        Yuyang Yang
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > 
> > > > > > Gluster-users mailing list
> > > > > > 
> > > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
> > > > > > 
> > > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
>