[Gluster-users] Gluster 3.6.4 peer rejected while doing probe

Atin Mukherjee amukherj at redhat.com
Mon Sep 14 09:12:27 UTC 2015



On 09/14/2015 02:33 PM, Davy Croonen wrote:
> Atin,
> 
> I performed a /gluster volume set <volumename>
> performance.flush-behind/ /off/on/ toggle on both volumes and after that
> the probe was successful.
> 
> So many thanks for your support.
> 
> Some additional info, in our lab I did some tests starting with gluster
> version 3.6.4 and was not able to reproduce the problem. After that I
> went looking for some differences with our production cluster and found
> out that we started there with version 3.5.x which we upgraded to
> version 3.6.4. So maybe the bug/incompatibility  is introduced somewhere
> after an upgrade procedure?
You will *only* hit this issue if you have upgraded from 3.5 to 3.6, so
your observation is correct, however the problem surfaces while bumping
up the cluster's op-version. We are ideally expected to write all the
new default information in the info file which seems to be missing. I
shall be working on this patch pretty soon and look to backport it in
3.x series.

Thanks,
Atin
> 
> Greetings
> Davy
> 
>> On 14 Sep 2015, at 07:43, Atin Mukherjee <amukherj at redhat.com
>> <mailto:amukherj at redhat.com>> wrote:
>>
>> Davy,
>>
>> This seems to be an issue which we also faced couple of months back
>> during upgrade testing and a bugzilla [1] was raised for the same. At
>> the time we didn't have the work around to make peer probe work, but
>> somehow I managed to get the workaround today.
>>
>> Could you do an explicit volume set on the existing cluster and then do
>> a peer probe? Let me know if that works.
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1248895
>>
>> Thanks,
>> Atin
>>
>> On 09/11/2015 05:41 PM, Davy Croonen wrote:
>>> Atin
>>>
>>> Please see the requested attachments.
>>>
>>> KR
>>> Davy
>>>
>>>> On 11 Sep 2015, at 14:03, Atin Mukherjee <amukherj at redhat.com
>>>> <mailto:amukherj at redhat.com>> wrote:
>>>>
>>>> Could you attach the contents of /var/lib/glusterd/vol/<volname>/info
>>>> file from both the nodes?
>>>>
>>>> ~Atin
>>>>
>>>> On 09/11/2015 04:50 PM, Davy Croonen wrote:
>>>>> Thanks for your quick respons.
>>>>>
>>>>> As reported in the log the checksums are indeed not the same. On
>>>>> gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
>>>>> 'info=2613085848’. Of course my next question is how can I fix this?
>>>>>
>>>>> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
>>>>> the entire vols directory and starting the gluster daemon again. On the
>>>>> gfs01a-dcg host I now did a gluster peer status which shows:
>>>>>
>>>>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>> <http://gfs02a-dcg.intnet.be>
>>>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> But, the checksum of the public volume is still not the same on
>>>>> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
>>>>> gfs01b-dcg (the replica of gfs01a-dcg) gives me:
>>>>>
>>>>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>> <http://gfs02a-dcg.intnet.be>
>>>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>>>> State: Peer Rejected (Connected)
>>>>>
>>>>> So my question remains any way to fix this?
>>>>>
>>>>> Kind regards
>>>>>
>>>>> Davy
>>>>>
>>>>>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C <rkavunga at redhat.com
>>>>>> <mailto:rkavunga at redhat.com>
>>>>>> <mailto:rkavunga at redhat.com>> wrote:
>>>>>>
>>>>>> Can you check the checksum of the volume "public" in both of the
>>>>>> current nodes. Checksums are located in
>>>>>> (/var/lib/glusterd/vols/public/cksum).
>>>>>>
>>>>>> Regards
>>>>>> Rafi KC
>>>>>>
>>>>>> On 09/11/2015 03:24 PM, Davy Croonen wrote:
>>>>>>> Hi all
>>>>>>>
>>>>>>> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
>>>>>>> distributed replicate setup with glusterfs 3.6.4. We want to expand
>>>>>>> the volume with 2 extra nodes (gfs02a and gfs02b) because we are
>>>>>>> running out of diskspace. Therefor we deployed 2 extra nodes with
>>>>>>> glusterfs 3.6.4.
>>>>>>>
>>>>>>> Now, while probing the 2 new nodes from a node in the existing
>>>>>>> cluster we got the following error:
>>>>>>>
>>>>>>> root at gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
>>>>>>> <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>
>>>>>>> peer probe: success.
>>>>>>> root at gfs01a-dcg:~# gluster peer status
>>>>>>> Number of Peers: 2
>>>>>>>
>>>>>>> Hostname: gfs01b-dcg.intnet.be <http://gfs01b-dcg.intnet.be>
>>>>>>> <http://gfs01b-dcg.intnet.be/>
>>>>>>> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
>>>>>>> State: Peer in Cluster (Connected)
>>>>>>>
>>>>>>> Hostname: gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>
>>>>>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>>>>>> *State: Peer Rejected (Connected)*
>>>>>>>
>>>>>>> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
>>>>>>> following entries are written:
>>>>>>>
>>>>>>> [2015-09-11 09:37:49.405906] I
>>>>>>> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
>>>>>>> Received CLI probe req gfs02a-dcg.intnet.be
>>>>>>> <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/> 24007
>>>>>>> [2015-09-11 09:37:49.428630] I
>>>>>>> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
>>>>>>> find peerinfo for host: gfs02a-dcg.intnet.be
>>>>>>> <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/> (24007)
>>>>>>> [2015-09-11 09:37:49.438636] I
>>>>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
>>>>>>> frame-timeout to 600
>>>>>>> [2015-09-11 09:37:49.440513] I
>>>>>>> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
>>>>>>> returned 0
>>>>>>> [2015-09-11 09:37:49.474316] I
>>>>>>> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
>>>>>>> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
>>>>>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>
>>>>>>> [2015-09-11 09:37:49.481801] I
>>>>>>> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
>>>>>>> resp to probe req
>>>>>>> [2015-09-11 09:37:51.650265] I
>>>>>>> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
>>>>>>> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
>>>>>>> gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>, port: 0
>>>>>>> [2015-09-11 09:37:51.665861] I
>>>>>>> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack]
>>>>>>> 0-management: using the op-version 30603
>>>>>>> [2015-09-11 09:37:51.690170] I
>>>>>>> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd:
>>>>>>> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>>>>>> [2015-09-11 09:37:51.692652] I
>>>>>>> [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd:
>>>>>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>,
>>>>>>> op_ret: 0, op_errno: 0, ret: 0
>>>>>>> [2015-09-11 09:37:51.706203] I
>>>>>>> [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req]
>>>>>>> 0-glusterd: Received probe from uuid:
>>>>>>> 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>>>>>> *[2015-09-11 09:37:51.708909] E [MSGID: 106010]
>>>>>>> [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management:
>>>>>>> Version of Cksums public differ. local cksum = 1932535021, remote
>>>>>>> cksum = 2474653383 on peer gfs02a-dcg.intnet.be
>>>>>>> <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/>*
>>>>>>> [2015-09-11 09:37:51.709026] I
>>>>>>> [glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd:
>>>>>>> Responded to gfs02a-dcg.intnet.be <http://gfs02a-dcg.intnet.be>
>>>>>>> <http://gfs02a-dcg.intnet.be/> (0),
>>>>>>> ret: 0
>>>>>>> [2015-09-11 09:37:55.537231] I
>>>>>>> [glusterd-handler.c:1241:__glusterd_handle_cli_list_friends]
>>>>>>> 0-glusterd: Received cli list req
>>>>>>>
>>>>>>> The exact same error appears while probing the second node (gfs02b).
>>>>>>>
>>>>>>> Anyone any idea how to solve this?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Davy
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
> 


More information about the Gluster-users mailing list