[Gluster-users] State: Peer Rejected (Connected)

Sun Aug 6 10:08:29 UTC 2017

I now also restarted the glusterd daemon on node2 and arbiternode and it seems to work again. It's not healing some files and I hope all goes well.
Thanks so far for your help. By the way I identified the process which suck up all the memory of my node1, it was that stupid mlocate script which runs in the early morning to index all files of a Linux server. I would recommend anyone using GlusterFS to uninstall the mlocate package to avoid this situation.

> -------- Original Message --------
> Subject: Re: [Gluster-users] State: Peer Rejected (Connected)
> Local Time: August 6, 2017 10:26 AM
> UTC Time: August 6, 2017 8:26 AM
> From: mabi at protonmail.ch
> To: Ji-Hyeon Gim <potatogim at potatogim.net>
> Gluster Users <gluster-users at gluster.org>
> Hi Ji-Hyeon,
> Thanks to your help I could find out the problematic file. This would be the quota file of my volume it has a different checksum on node1 whereas node2 and arbiternode have the same checksum. This is expected as I had issues which my quota file and had to fix it manually with a script (more details on this mailing list in a previous post) and I only did that on node1.
> So what I now did is to copy /var/lib/glusterd/vols/myvolume/quota.conf file from node1 to node2 and arbiternode and then restart the glusterd process on node1 but somehow this did not fix the issue. I suppose I am missing a step here and maybe you have an idea what?
> Here would be the relevant part of my glusterd.log file taken from node1:
> [2017-08-06 08:16:57.699131] E [MSGID: 106012] [glusterd-utils.c:2988:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvolume differ. local cksum = 3823389269, remote cksum = 733515336 on peer node2.domain.tld
> [2017-08-06 08:16:57.275558] E [MSGID: 106012] [glusterd-utils.c:2988:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvolume differ. local cksum = 3823389269, remote cksum = 733515336 on peer arbiternode.intra.oriented.ch
> Best regards,
> Mabi
>
>> -------- Original Message --------
>> Subject: Re: [Gluster-users] State: Peer Rejected (Connected)
>> Local Time: August 6, 2017 9:31 AM
>> UTC Time: August 6, 2017 7:31 AM
>> From: potatogim at potatogim.net
>> To: mabi <mabi at protonmail.ch>
>> Gluster Users <gluster-users at gluster.org>
>> On 2017년 08월 06일 15:59, mabi wrote:
>>> Hi,
>>>
>>> I have a 3 nodes replica (including arbiter) volume with GlusterFS
>>> 3.8.11 and this night one of my nodes (node1) had an out of memory for
>>> some unknown reason and as such the Linux OOM killer has killed the
>>> glusterd and glusterfs process. I restarted the glusterd process but
>>> now that node is in "Peer Rejected" state from the other nodes and
>>> from itself it rejects the two other nodes as you can see below from
>>> the output of "gluster peer status":
>>>
>>> Number of Peers: 2
>>>
>>> Hostname: arbiternode.domain.tld
>>> Uuid: 60a03a81-ba92-4b84-90fe-7b6e35a10975
>>> State: Peer Rejected (Connected)
>>>
>>> Hostname: node2.domain.tld
>>> Uuid: 4834dceb-4356-4efb-ad8d-8baba44b967c
>>> State: Peer Rejected (Connected)
>>>
>>>
>>>
>>> I also rebooted my node1 just in case but that did not help.
>>>
>>> I read here http://www.spinics.net/lists/gluster-users/msg25803.html
>>> that the problem could have to do something with the volume info file,
>>> in my case I checked the file:
>>>
>>> /var/lib/glusterd/vols/myvolume/info
>>>
>>> and they are the same on node1 and arbiternode but on node2 the order
>>> of the following volume parameters are different:
>>>
>>> features.quota-deem-statfs=on
>>> features.inode-quota=on
>>> nfs.disable=on
>>> performance.readdir-ahead=on
>>>
>>> Could that be the reason why the peer is in rejected status? can I
>>> simply edit this file on node2 to re-order the parameters like on the
>>> other 2 nodes?
>>>
>>> What else should I do to investigate the reason for this rejected peer
>>> state?
>>>
>>> Thank you in advance for the help.
>>>
>>> Best,
>>> Mabi
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> Hi mabi.
>> In my opinion, It caused by some volfile/checksum mismatch. try to look
>> glusterd log file(/var/log/glusterfs/glusterd.log) in REJECTED node, and
>> find some log like below
>> [2014-06-17 04:21:11.266398] I [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f
>> [2014-06-17 04:21:11.266485] E [glusterd-utils.c:2373:glusterd_compare_friend_volume] 0-management: Cksums of volume supportgfs differ. local cksum = 52468988, remote cksum = 2201279699 on peer 172.26.178.254
>> [2014-06-17 04:21:11.266542] I [glusterd-handler.c:3085:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 172.26.178.254 (0), ret: 0
>> [2014-06-17 04:21:11.272206] I [glusterd-rpc-ops.c:356:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f, host: 172.26.178.254, port: 0
>> if it is, you need to sync volfile files/directories under
>> /var/lib/glusterd/vols/<VOLNAME> from one of GOOD nodes.
>> for details to resolve this problem, please show more information such
>> as glusterd log :)
>> --
>> Best regards.
>> --
>> Ji-Hyeon Gim
>> Research Engineer, Gluesys
>> Address. Gluesys R&D Center, 5F, 11-31, Simin-daero 327beon-gil,
>> Dongan-gu, Anyang-si,
>> Gyeonggi-do, Korea
>> (14055)
>> Phone. +82-70-8787-1053
>> Fax. +82-31-388-3261
>> Mobile. +82-10-7293-8858
>> E-Mail. potatogim at potatogim.net
>> Website. www.potatogim.net
>> The time I wasted today is the tomorrow the dead man was eager to see yesterday.
>> - Sophocles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170806/8104a939/attachment.html>