Just ignore the previous email.<div id="yMail_cursorElementTracker_1615923124664"><br></div><div id="yMail_cursorElementTracker_1615923124861">Best Regards,</div><div id="yMail_cursorElementTracker_1615923130327">Strahil Nikolov<br> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Tue, Mar 16, 2021 at 21:27, Strahil Nikolov</div><div>&lt;hunter86_bg@yahoo.com&gt; wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> <div id="yiv2663115222"><div>According to&nbsp;https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/ :<div id="yiv2663115222yMail_cursorElementTracker_1615922795260"><br clear="none"></div><div id="yiv2663115222yMail_cursorElementTracker_1615922795440"><p style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);">Server quorum is controlled by two parameters:</p><ul style="margin:0px 0px 24px;padding:0px;list-style-position:initial;list-style-image:initial;line-height:24px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;font-size:14.4px;background-color:rgb(252, 252, 252);"><li style="list-style:disc;margin-left:24px;"><strong>cluster.server-quorum-type</strong></li></ul><p id="yiv2663115222yMail_cursorElementTracker_1615922819614" style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);">This value may be "server" to indicate that server quorum is enabled, or "none" to mean it's disabled.</p><p id="yiv2663115222yMail_cursorElementTracker_1615922821745" style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);"><br clear="none"></p><p id="yiv2663115222yMail_cursorElementTracker_1615922821950" style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);">So , try with 'none' .</p><p id="yiv2663115222yMail_cursorElementTracker_1615922839676" style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);">Best Regards,</p><p id="yiv2663115222yMail_cursorElementTracker_1615922854858" style="line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64, 64, 64);font-family:Lato, proxima-nova, Arial, sans-serif;background-color:rgb(252, 252, 252);">Strahil Nikolov</p> <br clear="none"> <div class="yiv2663115222yqt8421418923" id="yiv2663115222yqt89180"><blockquote style="margin:0 0 20px 0;"> <div style="font-family:Roboto, sans-serif;color:#6D00F6;"> <div>On Tue, Mar 16, 2021 at 20:16, Zenon Panoussis</div><div>&lt;oracle@provocation.net&gt; wrote:</div> </div> <div style="padding:10px 0 0 20px;margin:10px 0 0 0;border-left:1px solid #6D00F6;"> <br clear="none">&gt; Yes if the dataset is small, you can try rm -rf of the dir <br clear="none">&gt; from the mount (assuming no other application is accessing <br clear="none">&gt; them on the volume) launch heal once so that the heal info <br clear="none">&gt; becomes zero and then copy it over again .<br clear="none"><br clear="none">I did approximately so; the rm -rf took its sweet time and the<br clear="none">number of entries to be healed kept diminishing as the deletion<br clear="none">progressed. At the end I was left with<br clear="none"><br clear="none">Mon Mar 15 22:57:09 CET 2021<br clear="none">Gathering count of entries to be healed on volume gv0 has been successful<br clear="none"><br clear="none">Brick node01:/gfs/gv0<br clear="none">Number of entries: 3<br clear="none"><br clear="none">Brick mikrivouli:/gfs/gv0<br clear="none">Number of entries: 2<br clear="none"><br clear="none">Brick nanosaurus:/gfs/gv0<br clear="none">Number of entries: 3<br clear="none">--------------<br clear="none"><br clear="none">and that's where I've been ever since, for the past 20 hours.<br clear="none">SHD has kept trying to heal them all along and the log brings<br clear="none">us back to square one:<br clear="none"><br clear="none">[2021-03-16 14:51:35.059593 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-gv0-replicate-0: performing entry selfheal on 94aefa13-9828-49e5-9bac-6f70453c100f<br clear="none">[2021-03-16 15:39:43.680380 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]<br clear="none">[2021-03-16 15:39:43.769604 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]<br clear="none">[2021-03-16 15:39:43.908425 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]<br clear="none">[...]<br clear="none"><br clear="none">In other words, deleting and recreating the unhealable files<br clear="none">and directories was a workaround, but the underlying problem<br clear="none">persists and I can't even begin to look for it when I have no<br clear="none">clue what errno 22 means in plain English.<br clear="none"><br clear="none">In any case, glusterd.log is full of messages like<br clear="none"><br clear="none">[2021-03-16 15:37:03.398619 +0000] I [MSGID: 106533] [glusterd-volume-ops.c:717:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0<br clear="none">[2021-03-16 15:37:03.791452 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:260:glusterd_is_volume_in_server_quorum] 0-management: Dict get failed [{Key=cluster.server-quorum-type}]<br clear="none"><br clear="none">Every single "received heal vol req" message is immediately followed<br clear="none">by a "dict get failed", always for server-quorum-type, for hours on<br clear="none">end. And I begin to smell a bug. The CLI can query the value OK:<br clear="none"><br clear="none"># gluster volume get gv0 cluster.server-quorum-type<br clear="none">Option&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Value<br clear="none">------&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -----<br clear="none">cluster.server-quorum-type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; off<br clear="none"><br clear="none"><br clear="none">Checking all quorum-related settings, I get<br clear="none"><br clear="none"># gluster volume get gv0 all |grep quorum<br clear="none">cluster.quorum-type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  auto<br clear="none">cluster.quorum-count&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (null) (DEFAULT)<br clear="none">cluster.server-quorum-type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; off<br clear="none">cluster.server-quorum-ratio&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  51<br clear="none">cluster.quorum-reads&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; no (DEFAULT)<br clear="none">disperse.quorum-count&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  0 (DEFAULT)<br clear="none"><br clear="none">I never touched any of them and none of them appear in volume info<br clear="none">under "Options Reconfigured", so don't know why three of them are<br clear="none">not marked as defaults.<br clear="none"><br clear="none">Next, I tried setting server-quorum-type=server. The server-quorum-type<br clear="none">problem went away and I got a new kind of dict get failure:<br clear="none"><br clear="none">The message "E [MSGID: 106061] [glusterd-volgen.c:2564:brick_graph_add_pump] 0-management: Dict get failed [{Key=enable-pump}]" repeated 2 times between [2021-03-16 17:12:18.677594 +0000] and [2021-03-16 17:12:18.779859 +0000]<br clear="none"><br clear="none">I tried rolling back server-quorum-type=server and got this error:<br clear="none"><br clear="none"># gluster volume set gv0 cluster.server-quorum-type off<br clear="none">volume set: failed: option server-quorum-type off: 'off' is not valid (possible options are none, server.)<br clear="none"><br clear="none">Aha, but previously and by default it was clearly "off", not "none".<br clear="none">That's bug somewhere and that is what was causing the dict get failures<br clear="none">on server-quorum-type. The missing dict enable-pump that's required<br clear="none">by server-quorum-type=server looks also like a bug because there is<br clear="none">no such setting:<br clear="none"><br clear="none"># gluster volume get gv0 all |grep pump<br clear="none">#<br clear="none"><br clear="none">There are more similarly strange complaints in the glusterd log:<br clear="none"><br clear="none">[2021-03-16 17:25:43.134207 +0000] E [MSGID: 106434] [glusterd-utils.c:13379:glusterd_get_value_for_vme_entry] 0-management: xlator_volopt_dynload error (-1)<br clear="none">[2021-03-16 17:25:43.141816 +0000] W [MSGID: 106332] [glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed to get option for localtime-logging key<br clear="none">[2021-03-16 17:25:43.143185 +0000] W [MSGID: 106332] [glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed to get option for s3plugin-seckey key<br clear="none">[2021-03-16 17:25:43.143340 +0000] W [MSGID: 106332] [glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed to get option for s3plugin-keyid key<br clear="none">[2021-03-16 17:25:43.143484 +0000] W [MSGID: 106332] [glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed to get option for s3plugin-bucketid key<br clear="none">[2021-03-16 17:25:43.143621 +0000] W [MSGID: 106332] [glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed to get option for s3plugin-hostname key<br clear="none"><br clear="none">If none of this stuff is used in the first place, it should not<br clear="none">be triggering errors and warnings. If the S3 plugin is not enabled,<br clear="none">the S3 keys should not even be checked. Both the checking of the<br clear="none">keys and the error logging are bugs.<br clear="none"><br clear="none">Cool, I'm discovering more and more stuff that needs fixing, but<br clear="none">I'm making zero progress with my healing problem. I'm still stuck<br clear="none">with errno=22.<div class="yiv2663115222yqt0673778233" id="yiv2663115222yqtfd85906"><br clear="none"><br clear="none">________<br clear="none"><br clear="none"><br clear="none"><br clear="none">Community Meeting Calendar:<br clear="none"><br clear="none">Schedule -<br clear="none">Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">Bridge: <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://meet.google.com/cpu-eiue-hvk">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">Gluster-users mailing list<br clear="none"><a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:Gluster-users@gluster.org" target="_blank" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none"><a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none"></div> </div> </blockquote></div></div></div></div> </div> </blockquote></div>