[Gluster-users] question about sync replicate volume after rebooting one node

Wed Feb 17 06:38:51 UTC 2016

Hi,
But I also don't know why glusterfsd can't be start by glusterd after B node rebooted.The version of glusterfs on A node and B node are both 3.7.6. Can you explain this for me please？

Thanks，
Xin

At 2016-02-17 14:30:21, "Anuradha Talur" <atalur at redhat.com> wrote:
>
>
>----- Original Message -----
>> From: "songxin" <songxin_1980 at 126.com>
>> To: "Atin Mukherjee" <amukherj at redhat.com>
>> Cc: "Anuradha Talur" <atalur at redhat.com>, gluster-users at gluster.org
>> Sent: Wednesday, February 17, 2016 11:44:14 AM
>> Subject: Re:Re: [Gluster-users] question about sync replicate volume after rebooting one node
>> 
>> Hi，
>> The version of glusterfs on  A node and B node are both 3.7.6.
>> The time on B node is same after rebooting because B node hasn't RTC. Does it
>> cause the problem?
>> 
>> 
>> If I run " gluster volume start gv0 force " the glusterfsd can be started but
>> "gluster volume start gv0" don't work.
>> 
>Yes, there is a difference between volume start and volume start force.
>When a volume is in "Started" state already, gluster volume start gv0 won't do
>anything (meaning it doesn't bring up the dead bricks). When you say start force,
>status of glusterfsd's is checked and the glusterfsd's not running are spawned.
>Which is the case here in the setup you have.
>> 
>> The file  /var/lib/glusterd/vols/gv0/info on B node as below.
>> ...
>> type=2
>> count=2
>> status=1
>> sub_count=2
>> stripe_count=1
>> replica_count=2
>> disperse_count=0
>> redundancy_count=0
>> version=2
>> transport-type=0
>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>> op-version=3
>> client-op-version=3
>> quota-version=0
>> parent_volname=N/A
>> restored_from_snap=00000000-0000-0000-0000-000000000000
>> snap-max-hard-limit=256
>> performance.readdir-ahead=on
>> brick-0=128.224.162.255:-data-brick-gv0
>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>> 
>> 
>> The file  /var/lib/glusterd/vols/gv0/info on A node as below.
>> 
>> 
>> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info
>> type=2
>> count=2
>> status=1
>> sub_count=2
>> stripe_count=1
>> replica_count=2
>> disperse_count=0
>> redundancy_count=0
>> version=2
>> transport-type=0
>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>> op-version=3
>> client-op-version=3
>> quota-version=0
>> parent_volname=N/A
>> restored_from_snap=00000000-0000-0000-0000-000000000000
>> snap-max-hard-limit=256
>> performance.readdir-ahead=on
>> brick-0=128.224.162.255:-data-brick-gv0
>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>> 
>> 
>> Thanks,
>> Xin
>> 
>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at redhat.com> wrote:
>> >
>> >
>> >On 02/17/2016 08:23 AM, songxin wrote:
>> >> Hi,
>> >> Thank you for your immediate and detailed reply.And I have a few more
>> >> question about glusterfs.
>> >> A node IP is 128.224.162.163.
>> >> B node IP is 128.224.162.250.
>> >> 1.After reboot B node and start the glusterd service the glusterd log is
>> >> as blow.
>> >> ...
>> >> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
>> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>> >> with index 2
>> >> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
>> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>> >> with index 1
>> >> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
>> >> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>> >> 0-management: using the op-version 30706
>> >> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
>> >> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
>> >> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>> >> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
>> >> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not
>> >> lookup hostname of 128.224.162.163 : Temporary failure in name resolution
>> >> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
>> >> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management:
>> >> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum =
>> >> 4087388312 on peer 128.224.162.163
>> >The above log entry is the reason of the rejection of the peer, most
>> >probably its due to the compatibility issue. I believe the gluster
>> >versions are different (share gluster versions from both the nodes) in
>> >two nodes and you might have hit a bug.
>> >
>> >Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
>> >both the nodes?
>> >
>> >
>> >~Atin
>> >> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
>> >> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd:
>> >> Responded to 128.224.162.163 (0), ret: 0
>> >> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
>> >> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
>> >> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
>> >> 128.224.162.163, port: 0
>> >> ...
>> >> When I run gluster peer status on B node it show as below.
>> >> Number of Peers: 1
>> >> 
>> >> Hostname: 128.224.162.163
>> >> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>> >> State: Peer Rejected (Connected)
>> >> 
>> >> When I run "gluster volume status" on A node  it show as below.
>> >>  
>> >> Status of volume: gv0
>> >> Gluster process                             TCP Port  RDMA Port  Online
>> >> Pid
>> >> ------------------------------------------------------------------------------
>> >> Brick 128.224.162.163:/home/wrsadmin/work/t
>> >> mp/data/brick/gv0                           49152     0          Y
>> >> 13019
>> >> NFS Server on localhost                     N/A       N/A        N
>> >> N/A
>> >> Self-heal Daemon on localhost               N/A       N/A        Y
>> >> 13045
>> >>  
>> >> Task Status of Volume gv0
>> >> ------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >> 
>> >> It looks like the glusterfsd service is ok on A node.
>> >> 
>> >> If because the peer state is Rejected so gluterd didn't start the
>> >> glusterfsd?What causes this problem？
>> >> 
>> >> 
>> >> 2. Is glustershd(self-heal-daemon) the process as below?
>> >> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07   0:00
>> >> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> >> /var/lib/glusterd/glustershd/run/gluster ..
>> >> 
>> >> If it is， I want to know if the glustershd is also the bin glusterfsd，
>> >> just like glusterd and glusterfs.
>> >> 
>> >> Thanks,
>> >> Xin
>> >> 
>> >> 
>> >> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur at redhat.com> wrote:
>> >>>
>> >>>
>> >>>----- Original Message -----
>> >>>> From: "songxin" <songxin_1980 at 126.com>
>> >>>> To: gluster-users at gluster.org
>> >>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>> >>>> Subject: [Gluster-users] question about sync replicate volume after
>> >>>> 	rebooting one node
>> >>>> 
>> >>>> Hi,
>> >>>> I have a question about how to sync volume between two bricks after one
>> >>>> node
>> >>>> is reboot.
>> >>>> 
>> >>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B
>> >>>> node ip
>> >>>> is 128.124.10.2.
>> >>>> 
>> >>>> operation steps on A node as below
>> >>>> 1. gluster peer probe 128.124.10.2
>> >>>> 2. mkdir -p /data/brick/gv0
>> >>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0
>> >>>> 128.124.10.2 :/data/brick/gv1 force
>> >>>> 4. gluster volume start gv0
>> >>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>> >>>> 
>> >>>> operation steps on B node as below
>> >>>> 1 . mkdir -p /data/brick/gv0
>> >>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>> >>>> 
>> >>>> After all steps above , there a some gluster service process, including
>> >>>> glusterd, glusterfs and glusterfsd, running on both A and B node.
>> >>>> I can see these servic by command ps aux | grep gluster and command
>> >>>> gluster
>> >>>> volume status.
>> >>>> 
>> >>>> Now reboot the B node.After B reboot , there are no gluster service
>> >>>> running
>> >>>> on B node.
>> >>>> After I systemctl start glusterd , there is just glusterd service but
>> >>>> not
>> >>>> glusterfs and glusterfsd on B node.
>> >>>> Because glusterfs and glusterfsd are not running so I can't gluster
>> >>>> volume
>> >>>> heal gv0 full.
>> >>>> 
>> >>>> I want to know why glusterd don't start glusterfs and glusterfsd.
>> >>>
>> >>>On starting glusterd, glusterfsd should have started by itself.
>> >>>Could you share glusterd and brick log (on node B) so that we know why
>> >>>glusterfsd
>> >>>didn't start?
>> >>>
>> >>>Do you still see glusterfsd service running on node A? You can try running
>> >>>"gluster v start <VOLNAME> force"
>> >>>on one of the nodes and check if all the brick processes started.
>> >>>
>> >>>gluster volume status <VOLNAME> should be able to provide you with gluster
>> >>>process status.
>> >>>
>> >>>On restarting the node, glusterfs process for mount won't start by itself.
>> >>>You will have to run
>> >>>step 2 on node B again for it.
>> >>>
>> >>>> How do I restart these services on B node?
>> >>>> How do I sync the replicate volume after one node reboot?
>> >>>
>> >>>Once the glusterfsd process starts on node B too, glustershd --
>> >>>self-heal-daemon -- for replicate volume
>> >>>should start healing/syncing files that need to be synced. This deamon
>> >>>does periodic syncing of files.
>> >>>
>> >>>If you want to trigger a heal explicitly, you can run gluster volume heal
>> >>><VOLNAME> on one of the servers.
>> >>>> 
>> >>>> Thanks,
>> >>>> Xin
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> _______________________________________________
>> >>>> Gluster-users mailing list
>> >>>> Gluster-users at gluster.org
>> >>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> >>>
>> >>>--
>> >>>Thanks,
>> >>>Anuradha.
>> >> 
>> >> 
>> >> 
>> >>  
>> >> 
>> >> 
>> >> 
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >> 
>> 
>
>-- 
>Thanks,
>Anuradha.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160217/d836d047/attachment.html>