[Gluster-users] question about sync replicate volume after rebooting one node

Wed Feb 17 06:53:48 UTC 2016


On 02/17/2016 12:08 PM, songxin wrote:
> 
> Hi,
> But I also don't know why glusterfsd can't be start by glusterd after B
> node rebooted.The version of glusterfs on A node and B node are both
> 3.7.6. Can you explain this for me please？
Its because the GlusterD has failed to start on Node B. I've already
asked you in another mail to provide the delta of the gv0's info file to
get to the root cause.
> 
> Thanks，
> Xin
> 
> 
> 
> 
> 
> At 2016-02-17 14:30:21, "Anuradha Talur" <atalur at redhat.com> wrote:
>>
>>
>>----- Original Message -----
>>> From: "songxin" <songxin_1980 at 126.com>
>>> To: "Atin Mukherjee" <amukherj at redhat.com>
>>> Cc: "Anuradha Talur" <atalur at redhat.com>, gluster-users at gluster.org
>>> Sent: Wednesday, February 17, 2016 11:44:14 AM
>>> Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
>>> 
>>> Hi，
>>> The version of glusterfs on  A node and B node are both 3.7.6.
>>> The time on B node is same after rebooting because B node hasn't RTC. Does it
>>> cause the problem?
>>> 
>>> 
>>> If I run " gluster volume start gv0 force " the glusterfsd can be started but
>>> "gluster volume start gv0" don't work.
>>> 
>>Yes, there is a difference between volume start and volume start force.
>>When a volume is in "Started" state already, gluster volume start gv0 won't do
>>anything (meaning it doesn't bring up the dead bricks). When you say start force,
>>status of glusterfsd's is checked and the glusterfsd's not running are spawned.
>>Which is the case here in the setup you have.
>>> 
>>> The file  /var/lib/glusterd/vols/gv0/info on B node as below.
>>> ...
>>> type=2
>>> count=2
>>> status=1
>>> sub_count=2
>>> stripe_count=1
>>> replica_count=2
>>> disperse_count=0
>>> redundancy_count=0
>>> version=2
>>> transport-type=0
>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>>> op-version=3
>>> client-op-version=3
>>> quota-version=0
>>> parent_volname=N/A
>>> restored_from_snap=00000000-0000-0000-0000-000000000000
>>> snap-max-hard-limit=256
>>> performance.readdir-ahead=on
>>> brick-0=128.224.162.255:-data-brick-gv0
>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>>> 
>>> 
>>> The file  /var/lib/glusterd/vols/gv0/info on A node as below.
>>> 
>>> 
>>> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info
>>> type=2
>>> count=2
>>> status=1
>>> sub_count=2
>>> stripe_count=1
>>> replica_count=2
>>> disperse_count=0
>>> redundancy_count=0
>>> version=2
>>> transport-type=0
>>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>>> op-version=3
>>> client-op-version=3
>>> quota-version=0
>>> parent_volname=N/A
>>> restored_from_snap=00000000-0000-0000-0000-000000000000
>>> snap-max-hard-limit=256
>>> performance.readdir-ahead=on
>>> brick-0=128.224.162.255:-data-brick-gv0
>>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>>> 
>>> 
>>> Thanks,
>>> Xin
>>> 
>>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at redhat.com> wrote:
>>> >
>>> >
>>> >On 02/17/2016 08:23 AM, songxin wrote:
>>> >> Hi,
>>> >> Thank you for your immediate and detailed reply.And I have a few more
>>> >> question about glusterfs.
>>> >> A node IP is 128.224.162.163.
>>> >> B node IP is 128.224.162.250.
>>> >> 1.After reboot B node and start the glusterd service the glusterd log is
>>> >> as blow.
>>> >> ...
>>> >> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
>>> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> >> with index 2
>>> >> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
>>> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> >> with index 1
>>> >> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
>>> >> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>> >> 0-management: using the op-version 30706
>>> >> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
>>> >> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
>>> >> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>> >> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
>>> >> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
>>> >> lookup hostname of 128.224.162.163 : Temporary failure in name resolution
>>> >> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
>>> >> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management:
>>> >> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
>>> >> 4087388312 on peer 128.224.162.163
>>> >The above log entry is the reason of the rejection of the peer, most
>>> >probably its due to the compatibility issue. I believe the gluster
>>> >versions are different (share gluster versions from both the nodes) in
>>> >two nodes and you might have hit a bug.
>>> >
>>> >Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
>>> >both the nodes?
>>> >
>>> >
>>> >~Atin
>>> >> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
>>> >> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd:
>>> >> Responded to 128.224.162.163 (0), ret: 0
>>> >> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
>>> >> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
>>> >> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
>>> >> 128.224.162.163, port: 0
>>> >> ...
>>> >> When I run gluster peer status on B node it show as below.
>>> >> Number of Peers: 1
>>> >> 
>>> >> Hostname: 128.224.162.163
>>> >> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>> >> State: Peer Rejected (Connected)
>>> >> 
>>> >> When I run "gluster volume status" on A node  it show as below.
>>> >>  
>>> >> Status of volume: gv0
>>> >> Gluster process                             TCP Port  RDMA Port  Online
>>> >> Pid
>>> >> ------------------------------------------------------------------------------
>>> >> Brick 128.224.162.163:/home/wrsadmin/work/t
>>> >> mp/data/brick/gv0                           49152     0          Y
>>> >> 13019
>>> >> NFS Server on localhost                     N/A       N/A        N
>>> >> N/A
>>> >> Self-heal Daemon on localhost               N/A       N/A        Y
>>> >> 13045
>>> >>  
>>> >> Task Status of Volume gv0
>>> >> ------------------------------------------------------------------------------
>>> >> There are no active volume tasks
>>> >> 
>>> >> It looks like the glusterfsd service is ok on A node.
>>> >> 
>>> >> If because the peer state is Rejected so gluterd didn't start the
>>> >> glusterfsd?What causes this problem？
>>> >> 
>>> >> 
>>> >> 2. Is glustershd(self-heal-daemon) the process as below?
>>> >> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07   0:00
>>> >> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> >> /var/lib/glusterd/glustershd/run/gluster ..
>>> >> 
>>> >> If it is， I want to know if the glustershd is also the bin glusterfsd，
>>> >> just like glusterd and glusterfs.
>>> >> 
>>> >> Thanks,
>>> >> Xin
>>> >> 
>>> >> 
>>> >> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur at redhat.com> wrote:
>>> >>>
>>> >>>
>>> >>>----- Original Message -----
>>> >>>> From: "songxin" <songxin_1980 at 126.com>
>>> >>>> To: gluster-users at gluster.org
>>> >>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>>> >>>> Subject: [Gluster-users] question about sync replicate volume after
>>> >>>> 	rebooting one node
>>> >>>> 
>>> >>>> Hi,
>>> >>>> I have a question about how to sync volume between two bricks after one
>>> >>>> node
>>> >>>> is reboot.
>>> >>>> 
>>> >>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B
>>> >>>> node ip
>>> >>>> is 128.124.10.2.
>>> >>>> 
>>> >>>> operation steps on A node as below
>>> >>>> 1. gluster peer probe 128.124.10.2
>>> >>>> 2. mkdir -p /data/brick/gv0
>>> >>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
>>> >>>> 128.124.10.2 :/data/brick/gv1 force
>>> >>>> 4. gluster volume start gv0
>>> >>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>> >>>> 
>>> >>>> operation steps on B node as below
>>> >>>> 1 . mkdir -p /data/brick/gv0
>>> >>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>> >>>> 
>>> >>>> After all steps above , there a some gluster service process, including
>>> >>>> glusterd, glusterfs and glusterfsd, running on both A and B node.
>>> >>>> I can see these servic by command ps aux | grep gluster and command
>>> >>>> gluster
>>> >>>> volume status.
>>> >>>> 
>>> >>>> Now reboot the B node.After B reboot , there are no gluster service
>>> >>>> running
>>> >>>> on B node.
>>> >>>> After I systemctl start glusterd , there is just glusterd service but
>>> >>>> not
>>> >>>> glusterfs and glusterfsd on B node.
>>> >>>> Because glusterfs and glusterfsd are not running so I can't gluster
>>> >>>> volume
>>> >>>> heal gv0 full.
>>> >>>> 
>>> >>>> I want to know why glusterd don't start glusterfs and glusterfsd.
>>> >>>
>>> >>>On starting glusterd, glusterfsd should have started by itself.
>>> >>>Could you share glusterd and brick log (on node B) so that we know why
>>> >>>glusterfsd
>>> >>>didn't start?
>>> >>>
>>> >>>Do you still see glusterfsd service running on node A? You can try running
>>> >>>"gluster v start <VOLNAME> force"
>>> >>>on one of the nodes and check if all the brick processes started.
>>> >>>
>>> >>>gluster volume status <VOLNAME> should be able to provide you with gluster
>>> >>>process status.
>>> >>>
>>> >>>On restarting the node, glusterfs process for mount won't start by itself.
>>> >>>You will have to run
>>> >>>step 2 on node B again for it.
>>> >>>
>>> >>>> How do I restart these services on B node?
>>> >>>> How do I sync the replicate volume after one node reboot?
>>> >>>
>>> >>>Once the glusterfsd process starts on node B too, glustershd --
>>> >>>self-heal-daemon -- for replicate volume
>>> >>>should start healing/syncing files that need to be synced. This deamon
>>> >>>does periodic syncing of files.
>>> >>>
>>> >>>If you want to trigger a heal explicitly, you can run gluster volume heal
>>> >>><VOLNAME> on one of the servers.
>>> >>>> 
>>> >>>> Thanks,
>>> >>>> Xin
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> _______________________________________________
>>> >>>> Gluster-users mailing list
>>> >>>> Gluster-users at gluster.org
>>> >>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>> >>>
>>> >>>--
>>> >>>Thanks,
>>> >>>Anuradha.
>>> >> 
>>> >> 
>>> >> 
>>> >>  
>>> >> 
>>> >> 
>>> >> 
>>> >> _______________________________________________
>>> >> Gluster-users mailing list
>>> >> Gluster-users at gluster.org
>>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>>> >> 
>>> 
>>
>>-- 
>>Thanks,
>>Anuradha.
> 
> 
> 
>  
>