[Gluster-users] Fixing a rejected peer

Jamie Lawrence jlawrence at squaretrade.com
Tue Mar 6 19:48:10 UTC 2018


Just following up on the below after having some time to track down the differences.

On the bad peer, the `tier-enabled=0` line in .../vols/<volname>/info was removed after I copied it over and as mentioned, the cksum file changed to a value that doesn't match the others. The logs only complain about the cksum (appended below).

I haven't done anything with tiering; I suppose it is possible that Ovirt did something goofy when I tried using it, but I am very confused by this whack-a-mole game, and don't know how to resolve it.

-j


- - - - 


> On Mar 5, 2018, at 6:41 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

[...]

> Started things back up and now gluster-3 is being rejected by the other two. The error is below.
> 
> I'm tempted to repeat - down things, copy the checksum the "good" ones agree on, start things; but given that this has turned into a balloon-squeezing exercise, I want to make sure I'm not doing this the wrong way.
> 
> Yes, that's the way. Copy /var/lib/glusterd/vols/<volname>/ from the good node to the rejected one and restart glusterd service on the rejected peer.

So I did this, and it immediately went back to rejected state. The `cksum` file immediately diverged.



- - - - - - 
[2018-03-06 18:31:25.380546] I [MSGID: 106005] [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: Brick 172.16.0.153:/gluster-bricks/sc5_ovirt_engine/sc5_ovirt_engine has disconnected from glusterd.
[2018-03-06 18:31:25.380913] I [MSGID: 106490] [glusterd-handler.c:2891:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c
[2018-03-06 18:31:25.384259] I [MSGID: 106493] [glusterd-handler.c:2954:__glusterd_handle_probe_query] 0-glusterd: Responded to sc5-gluster-1.squaretrade.com, op_ret: 0, op_errno: 0, ret: 0
[2018-03-06 18:31:25.384411] I [MSGID: 106490] [glusterd-handler.c:2891:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a
[2018-03-06 18:31:25.384541] I [MSGID: 106493] [glusterd-handler.c:2954:__glusterd_handle_probe_query] 0-glusterd: Responded to sc5-gluster-10g-2, op_ret: 0, op_errno: 0, ret: 0
[2018-03-06 18:31:25.388144] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c
[2018-03-06 18:31:25.388795] E [MSGID: 106010] [glusterd-utils.c:3374:glusterd_compare_friend_volume] 0-management: Version of Cksums sc5-ovirt_engine differ. local cksum = 53769889, remote cksum = 2068896937 on peer sc5-gluster-10g-1.squaretrade.com
[2018-03-06 18:31:25.388978] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to sc5-gluster-10g-1.squaretrade.com (0), ret: 0, op_ret: -1
[2018-03-06 18:31:25.390976] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a
[2018-03-06 18:31:25.391241] E [MSGID: 106010] [glusterd-utils.c:3374:glusterd_compare_friend_volume] 0-management: Version of Cksums sc5-ovirt_engine differ. local cksum = 53769889, remote cksum = 2068896937 on peer sc5-gluster-2.squaretrade.com
[2018-03-06 18:31:25.391390] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to sc5-gluster-2.squaretrade.com (0), ret: 0, op_ret: -1
[2018-03-06 18:31:25.402669] I [MSGID: 106143] [glusterd-pmap.c:295:pmap_registry_bind] 0-pmap: adding brick /gluster-bricks/sc5_ovirt_engine/sc5_ovirt_engine on port 49152
[2018-03-06 18:31:37.422140] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2018-03-06 18:32:06.551544] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2018-03-06 18:32:32.414663] I [MSGID: 106487] [glusterd-handler.c:1243:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req sc5-gluster-10g-2 24007
[2018-03-06 18:32:36.957054] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2018-03-06 18:32:43.067011] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume sc5-ovirt_engine
[2018-03-06 18:33:07.020062] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2018-03-06 18:33:36.435916] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2018-03-06 18:34:26.754494] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume sc5-ovirt_engine
[2018-03-06 18:35:05.206520] I [MSGID: 106488] [glusterd-handler.c:1548:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2018-03-06 18:35:05.524085] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume sc5-ovirt_engine
The message "I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req" repeated 4 times between [2018-03-06 18:33:36.435916] and [2018-03-06 18:35:06.421623]


More information about the Gluster-users mailing list