[Gluster-users] Fixing a rejected peer

Atin Mukherjee amukherj at redhat.com
Wed Mar 7 12:39:49 UTC 2018


Please run 'gluster v get all cluster.max-op-version' and what ever value
it throws up should be used to bump up the cluster.op-version (gluster v
set all cluster.op-version <value>) . With that if you restart the rejected
peer I believe the problem should go away, if it doesn't I'd need to
investigate further once you can pass down the glusterd and cmd_history log
files and the content of /var/lib/glusterd from all the nodes.

On Wed, Mar 7, 2018 at 4:13 AM, Jamie Lawrence <jlawrence at squaretrade.com>
wrote:

>
> > On Mar 5, 2018, at 6:41 PM, Atin Mukherjee <amukherj at redhat.com> wrote:
>
> > I'm tempted to repeat - down things, copy the checksum the "good" ones
> agree on, start things; but given that this has turned into a
> balloon-squeezing exercise, I want to make sure I'm not doing this the
> wrong way.
> >
> > Yes, that's the way. Copy /var/lib/glusterd/vols/<volname>/ from the
> good node to the rejected one and restart glusterd service on the rejected
> peer.
>
>
> My apologies for the multiple messages - I'm having to work on this
> episodically.
>
> I've tried again to reset state on the bad peer, to no avail. This time I
> downed all of the peers, copied things over, ensuring that the tier-enabled
> line was absent and started back up; the cksum immediately changed to some
> a bad value, the two good nodes added that line in, and the bad node didn't
> have it.
>
> Just to have a clear view of this, I did it yet again, this time ensuring
> the tier-enbled line was present everywhere. Same result, except that it
> didn't add the tier-enabled line, which I suppose makes some sense.
>
> One oddity -  I see:
>
> # gluster v get all cluster.op-version
> Option                                  Value
> ------                                  -----
> cluster.op-version                      30800
>
> but from one of the `info` files:
>
> op-version=30712
> client-op-version=30712
>
> I don't know what it means that the cluster is at one version but
> apparently the volume is set for another - I thought that was a
> cluster-level setting. (Client.op-version theoretically makes more sense -
> I can see Ovirt wanting an older version.)
>
> I'm at a loss to fix this - copying /var/lib/glusterd/vol/<vol> over
> doesn't fix the problem. I'd be somewhat OK with trashing the volume and
> starting over, if it weren't for two things: (1) Ovirt  was also a massive
> pain to set up, and it configured on this volume. But perhaps more
> importantly, I'm concerned with this happening again once this is in
> production, which would be Bad, especially if I don't have a fix.
>
> So at this point, I'm unclear on how to move forward or even where more to
> look for potential problems.
>
> -j
>
> - - - -
>
> [2018-03-06 22:30:32.421530] I [MSGID: 106490] [glusterd-handler.c:2540:__
> glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from
> uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c
> [2018-03-06 22:30:32.422582] E [MSGID: 106010] [glusterd-utils.c:3374:
> glusterd_compare_friend_volume] 0-management: Version of Cksums
> sc5-ovirt_engine differ. local cksum = 3949237931, remote cksum =
> 2068896937 on peer sc5-gluster-10g-1.squaretrade.com
> [2018-03-06 22:30:32.422774] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp]
> 0-glusterd: Responded to sc5-gluster-10g-1.squaretrade.com (0), ret: 0,
> op_ret: -1
> [2018-03-06 22:30:32.424621] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk]
> 0-glusterd: Received RJT from uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c,
> host: sc5-gluster-10g-1.squaretrade.com, port: 0
> [2018-03-06 22:30:32.425563] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk]
> 0-glusterd: Received RJT from uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a,
> host: sc5-gluster-2.squaretrade.com, port: 0
> [2018-03-06 22:30:32.426706] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30800
> [2018-03-06 22:30:32.428075] I [MSGID: 106490] [glusterd-handler.c:2540:__
> glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from
> uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a
> [2018-03-06 22:30:32.428325] E [MSGID: 106010] [glusterd-utils.c:3374:
> glusterd_compare_friend_volume] 0-management: Version of Cksums
> sc5-ovirt_engine differ. local cksum = 3949237931, remote cksum =
> 2068896937 on peer sc5-gluster-2.squaretrade.com
> [2018-03-06 22:30:32.428468] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp]
> 0-glusterd: Responded to sc5-gluster-2.squaretrade.com (0), ret: 0,
> op_ret: -1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180307/39e8c8d6/attachment.html>


More information about the Gluster-users mailing list