[Gluster-users] peer status rejected (connected)

Wed Aug 7 01:42:27 UTC 2013

The solution has been found, but it's kind of ugly.
peer detach
stop gluster on new node, wipe /var/lib/gluster
restart gluster on new node

on old node, run:
for Q in `gluster volume list`; do
   gluster reset $Q
done

peer probe

After this it successfully connected the new node.
I have no idea why this was required.

We still can't remove, replace or add bricks but I'll continue that in 
another thread..

-T

On 07/08/13 10:51, Toby Corkindale wrote:
> On 06/08/13 21:25, Kaushal M wrote:
>> Toby,
>> What versions of gluster are on the peers? And does the cluster have
>> just two peers or more?
>
> Version 3.3.1.
> The cluster has/had two nodes; we're trying to replace one with another
> one.
>
>> On Tue, Aug 6, 2013 at 4:32 PM, Toby Corkindale
>> <toby.corkindale at strategicdata.com.au> wrote:
>>> ----- Original Message -----
>>>> From: "Toby Corkindale" <toby.corkindale at strategicdata.com.au>
>>>> To: gluster-users at gluster.org
>>>> Sent: Tuesday, 6 August, 2013 6:26:59 PM
>>>> Subject: Re: [Gluster-users] peer status rejected (connected)
>>>>
>>>> On 06/08/13 18:12, Toby Corkindale wrote:
>>>>> Hi,
>>>>> What does it mean when you use "peer probe" to add a new host, but
>>>>> then
>>>>> afterwards the "peer status" is reported as "Rejected" yet
>>>>> "Connected"?
>>>>> And of course -- how does one fix this?
>>>>>
>>>>> gluster> peer status
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: 192.168.10.32
>>>>> Uuid: 32497846-6e02-4b68-b147-6f4b936b3373
>>>>> State: Peer Rejected (Connected)
>>>>
>>>> It's worth noting that the attempt to probe the peer was listed as
>>>> successful though:
>>>>
>>>> gluster> peer probe mel-storage04
>>>>
>>>> Probe successful
>>>> gluster> peer status
>>>> Number of Peers: 1
>>>>
>>>> Hostname: mel-storage04
>>>> Uuid: 6254c24d-29d4-4794-8159-3c2b03b34798
>>>> State: Peer Rejected (Connected)
>>>>
>>>
>>>
>>> After searching around some more, I saw that this issue is usually
>>> caused by two peers joining, when one has a very out of date volume
>>> list.
>>> And indeed, in the log files I see messages about checksums failing
>>> to agree on volumes being exchanged.
>>>
>>> The odd thing is, this is a fresh server, running the same version of
>>> glusterfs.
>>> I tried stopping the services entirely, rm -rf /var/lib/glusterfs/*,
>>> and then started up again and tried probing that peer -- and received
>>> the same Rejection.
>>> I'm confused as to how it could possibly be getting a different
>>> volume checksum, when it didn't even have its own copy.
>>>
>>> Does the community have any suggestions about resolving this?
>>>
>>> See also, inability to remove or replace bricks in separate message -
>>> which might be related, although the errors occur even if run on the
>>> cluster without this problematic peer attached at all.
>>>
>>> -Toby
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users