Are you able to set the logs to debug level ?<div>It might provide a clue what it is going on.</div><div><br></div><div>Best Regards,</div><div>Strahil Nikolov</div><div> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Thu, Jan 18, 2024 at 13:08, Diego Zuccato</div><div><diego.zuccato@unibo.it> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> That's the same kind of errors I keep seeing on my 2 clusters, <br clear="none">regenerated some months ago. Seems a pseudo-split-brain that should be <br clear="none">impossible on a replica 3 cluster but keeps happening.<br clear="none">Sadly going to ditch Gluster ASAP.<br clear="none"><br clear="none">Diego<br clear="none"><br clear="none">Il 18/01/2024 07:11, Hu Bert ha scritto:<br clear="none">> Good morning,<br clear="none">> heal still not running. Pending heals now sum up to 60K per brick.<br clear="none">> Heal was starting instantly e.g. after server reboot with version<br clear="none">> 10.4, but doesn't with version 11. What could be wrong?<br clear="none">> <br clear="none">> I only see these errors on one of the "good" servers in glustershd.log:<br clear="none">> <br clear="none">> [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]<br clear="none">> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:<br clear="none">> remote operation failed.<br clear="none">> [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},<br clear="none">> {gfid=cb39a1e4-2a4c-4727-861d-3ed9e<br clear="none">> f00681b}, {errno=2}, {error=No such file or directory}]<br clear="none">> [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]<br clear="none">> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:<br clear="none">> remote operation failed.<br clear="none">> [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},<br clear="none">> {gfid=3e9b178c-ae1f-4d85-ae47-fc539<br clear="none">> d94dd11}, {errno=2}, {error=No such file or directory}]<br clear="none">> <br clear="none">> About 7K today. Any ideas? Someone?<br clear="none">> <br clear="none">> <br clear="none">> Best regards,<br clear="none">> Hubert<br clear="none">> <br clear="none">> Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <<a shape="rect" ymailto="mailto:revirii@googlemail.com" href="mailto:revirii@googlemail.com">revirii@googlemail.com</a>>:<br clear="none">>><br clear="none">>> ok, finally managed to get all servers, volumes etc runnung, but took<br clear="none">>> a couple of restarts, cksum checks etc.<br clear="none">>><br clear="none">>> One problem: a volume doesn't heal automatically or doesn't heal at all.<br clear="none">>><br clear="none">>> gluster volume status<br clear="none">>> Status of volume: workdata<br clear="none">>> Gluster process                             TCP Port  RDMA Port  Online  Pid<br clear="none">>> ------------------------------------------------------------------------------<br clear="none">>> Brick glusterpub1:/gluster/md3/workdata     58832     0          Y       3436<br clear="none">>> Brick glusterpub2:/gluster/md3/workdata     59315     0          Y       1526<br clear="none">>> Brick glusterpub3:/gluster/md3/workdata     56917     0          Y       1952<br clear="none">>> Brick glusterpub1:/gluster/md4/workdata     59688     0          Y       3755<br clear="none">>> Brick glusterpub2:/gluster/md4/workdata     60271     0          Y       2271<br clear="none">>> Brick glusterpub3:/gluster/md4/workdata     49461     0          Y       2399<br clear="none">>> Brick glusterpub1:/gluster/md5/workdata     54651     0          Y       4208<br clear="none">>> Brick glusterpub2:/gluster/md5/workdata     49685     0          Y       2751<br clear="none">>> Brick glusterpub3:/gluster/md5/workdata     59202     0          Y       2803<br clear="none">>> Brick glusterpub1:/gluster/md6/workdata     55829     0          Y       4583<br clear="none">>> Brick glusterpub2:/gluster/md6/workdata     50455     0          Y       3296<br clear="none">>> Brick glusterpub3:/gluster/md6/workdata     50262     0          Y       3237<br clear="none">>> Brick glusterpub1:/gluster/md7/workdata     52238     0          Y       5014<br clear="none">>> Brick glusterpub2:/gluster/md7/workdata     52474     0          Y       3673<br clear="none">>> Brick glusterpub3:/gluster/md7/workdata     57966     0          Y       3653<br clear="none">>> Self-heal Daemon on localhost               N/A       N/A        Y       4141<br clear="none">>> Self-heal Daemon on glusterpub1             N/A       N/A        Y       5570<br clear="none">>> Self-heal Daemon on glusterpub2             N/A       N/A        Y       4139<br clear="none">>><br clear="none">>> "gluster volume heal workdata info" lists a lot of files per brick.<br clear="none">>> "gluster volume heal workdata statistics heal-count" shows thousands<br clear="none">>> of files per brick.<br clear="none">>> "gluster volume heal workdata enable" has no effect.<br clear="none">>><br clear="none">>> gluster volume heal workdata full<br clear="none">>> Launching heal operation to perform full self heal on volume workdata<br clear="none">>> has been successful<br clear="none">>> Use heal info commands to check status.<br clear="none">>><br clear="none">>> -> not doing anything at all. And nothing happening on the 2 "good"<br clear="none">>> servers in e.g. glustershd.log. Heal was working as expected on<br clear="none">>> version 10.4, but here... silence. Someone has an idea?<br clear="none">>><br clear="none">>><br clear="none">>> Best regards,<br clear="none">>> Hubert<br clear="none">>><br clear="none">>> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira<br clear="none">>> <<a shape="rect" ymailto="mailto:gilberto.nunes32@gmail.com" href="mailto:gilberto.nunes32@gmail.com">gilberto.nunes32@gmail.com</a>>:<br clear="none">>>><br clear="none">>>> Ah! Indeed! You need to perform an upgrade in the clients as well.<br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>><br clear="none">>>> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <<a shape="rect" ymailto="mailto:revirii@googlemail.com" href="mailto:revirii@googlemail.com">revirii@googlemail.com</a>> escreveu:<br clear="none">>>>><br clear="none">>>>> morning to those still reading :-)<br clear="none">>>>><br clear="none">>>>> i found this: <a shape="rect" href="https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them" target="_blank">https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them</a><br clear="none">>>>><br clear="none">>>>> there's a paragraph about "peer rejected" with the same error message,<br clear="none">>>>> telling me: "Update the cluster.op-version" - i had only updated the<br clear="none">>>>> server nodes, but not the clients. So upgrading the cluster.op-version<br clear="none">>>>> wasn't possible at this time. So... upgrading the clients to version<br clear="none">>>>> 11.1 and then the op-version should solve the problem?<br clear="none">>>>><br clear="none">>>>><br clear="none">>>>> Thx,<br clear="none">>>>> Hubert<br clear="none">>>>><br clear="none">>>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <<a shape="rect" ymailto="mailto:revirii@googlemail.com" href="mailto:revirii@googlemail.com">revirii@googlemail.com</a>>:<br clear="none">>>>>><br clear="none">>>>>> Hi,<br clear="none">>>>>> just upgraded some gluster servers from version 10.4 to version 11.1.<br clear="none">>>>>> Debian bullseye & bookworm. When only installing the packages: good,<br clear="none">>>>>> servers, volumes etc. work as expected.<br clear="none">>>>>><br clear="none">>>>>> But one needs to test if the systems work after a daemon and/or server<br clear="none">>>>>> restart. Well, did a reboot, and after that the rebooted/restarted<br clear="none">>>>>> system is "out". Log message from working node:<br clear="none">>>>>><br clear="none">>>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]<br clear="none">>>>>> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]<br clear="none">>>>>> 0-management: using the op-version 100000<br clear="none">>>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]<br clear="none">>>>>> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]<br clear="none">>>>>> 0-glusterd: Received probe from uuid:<br clear="none">>>>>> b71401c3-512a-47cb-ac18-473c4ba7776e<br clear="none">>>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]<br clear="none">>>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:<br clear="none">>>>>> Version of Cksums sourceimages differ. local cksum = 2204642525,<br clear="none">>>>>> remote cksum = 1931483801 on peer gluster190<br clear="none">>>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]<br clear="none">>>>>> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:<br clear="none">>>>>> Responded to gluster190 (0), ret: 0, op_ret: -1<br clear="none">>>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]<br clear="none">>>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:<br clear="none">>>>>> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:<br clear="none">>>>>> gluster190, port: 0<br clear="none">>>>>><br clear="none">>>>>> peer status from rebooted node:<br clear="none">>>>>><br clear="none">>>>>> <a shape="rect" ymailto="mailto:root@gluster190" href="mailto:root@gluster190">root@gluster190</a> ~ # gluster peer status<br clear="none">>>>>> Number of Peers: 2<br clear="none">>>>>><br clear="none">>>>>> Hostname: gluster189<br clear="none">>>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7<br clear="none">>>>>> State: Peer Rejected (Connected)<br clear="none">>>>>><br clear="none">>>>>> Hostname: gluster188<br clear="none">>>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d<br clear="none">>>>>> State: Peer Rejected (Connected)<br clear="none">>>>>><br clear="none">>>>>> So the rebooted gluster190 is not accepted anymore. And thus does not<br clear="none">>>>>> appear in "gluster volume status". I then followed this guide:<br clear="none">>>>>><br clear="none">>>>>> <a shape="rect" href="https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/" target="_blank">https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/</a><br clear="none">>>>>><br clear="none">>>>>> Remove everything under /var/lib/glusterd/ (except glusterd.info) and<br clear="none">>>>>> restart glusterd service etc. Data get copied from other nodes,<br clear="none">>>>>> 'gluster peer status' is ok again - but the volume info is missing,<br clear="none">>>>>> /var/lib/glusterd/vols is empty. When syncing this dir from another<br clear="none">>>>>> node, the volume then is available again, heals start etc.<br clear="none">>>>>><br clear="none">>>>>> Well, and just to be sure that everything's working as it should,<br clear="none">>>>>> rebooted that node again - the rebooted node is kicked out again, and<br clear="none">>>>>> you have to restart bringing it back again.<br clear="none">>>>>><br clear="none">>>>>> Sry, but did i miss anything? Has someone experienced similar<br clear="none">>>>>> problems? I'll probably downgrade to 10.4 again, that version was<br clear="none">>>>>> working...<br clear="none">>>>>><br clear="none">>>>>><br clear="none">>>>>> Thx,<br clear="none">>>>>> Hubert<br clear="none">>>>> ________<br clear="none">>>>><br clear="none">>>>><br clear="none">>>>><br clear="none">>>>> Community Meeting Calendar:<br clear="none">>>>><br clear="none">>>>> Schedule -<br clear="none">>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>>>> Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>>>> Gluster-users mailing list<br clear="none">>>>> <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">>>>> <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">> ________<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Community Meeting Calendar:<br clear="none">> <br clear="none">> Schedule -<br clear="none">> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">> Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">> Gluster-users mailing list<br clear="none">> <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">> <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none"><br clear="none">-- <br clear="none">Diego Zuccato<br clear="none">DIFA - Dip. di Fisica e Astronomia<br clear="none">Servizi Informatici<br clear="none">Alma Mater Studiorum - Università di Bologna<br clear="none">V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br clear="none">tel.: +39 051 20 95786<div class="yqt3304298705" id="yqtfd81170"><br clear="none">________<br clear="none"><br clear="none"><br clear="none"><br clear="none">Community Meeting Calendar:<br clear="none"><br clear="none">Schedule -<br clear="none">Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">Gluster-users mailing list<br clear="none"><a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none"><a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none"></div> </div> </blockquote></div>