Hi,<div id="yMail_cursorElementTracker_1622458982668"><br></div><div id="yMail_cursorElementTracker_1622458982806">I think that the best way is to go through the logs on the affected arbiter brick (maybe even temporarily increase the log level).</div><div id="yMail_cursorElementTracker_1622459036545"><br></div><div id="yMail_cursorElementTracker_1622459403909">What is the output of:</div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif"><br></font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459123465">find /var/brick/arb_0/brick -not -user 36 -print</font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459368261">find /var/brick/arb_0/brick -not group 36 -print</font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif"><br></font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459407713">Maybe there are some files/dirs that are with wrong ownership.</font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459459508"><br></font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459459710">Best Regards,</font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459463783">Strahil Nikolov</font></div><div id="yMail_cursorElementTracker_1622459036679"><font face="sans-serif" id="yMail_cursorElementTracker_1622459395569"><br></font> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;">  </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> Thanks Strahil,<br clear="none"><br clear="none"><br clear="none">unfortunately I cannot connect as the mount is denied as in mount.log provided.<br clear="none">IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem.<br clear="none"><br clear="none">I wonder why the root dir on the arb bricks has wrong UID:GID.<br clear="none">I added regular data bricks before without any problems on node2.<br clear="none"><br clear="none"><br clear="none">Also when executing "watch df"<br clear="none"><br clear="none">I see<br clear="none"><br clear="none">/dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0<br clear="none">..<br clear="none"><br clear="none">/dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0<br clear="none"><br clear="none">..<br clear="none"><br clear="none">/dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0<br clear="none"><br clear="none">So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either.<br clear="none"><br clear="none">As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these...<br clear="none"><br clear="none"><br clear="none">A.<br clear="none"><div class="yqt8261816820" id="yqtfd66758"><br clear="none">"Strahil Nikolov" <a shape="rect" ymailto="mailto:hunter86_bg@yahoo.com" href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a> – 31. Mai 2021 11:12<br clear="none">> For the arb_0 I seeonly 8 clients , while there should be 12 clients:<br clear="none">> Brick : 192.168.0.40:/var/bricks/0/brick<br clear="none">> Clients connected : 12<br clear="none">><br clear="none">> Brick : 192.168.0.41:/var/bricks/0/brick<br clear="none">> Clients connected : 12<br clear="none">><br clear="none">> Brick : 192.168.0.80:/var/bricks/arb_0/brick<br clear="none">> Clients connected : 8<br clear="none">><br clear="none">> Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running.<br clear="none">><br clear="none">><br clear="none">><br clear="none">> Yet, this doesn't explain why the heal daemon is not able to replicate properly.<br clear="none">><br clear="none">><br clear="none">><br clear="none">> Best Regards,<br clear="none">> Strahil Nikolov<br clear="none">> ><br clear="none">> > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty.<br clear="none">> ><br clear="none">> > node0: 192.168.0.40<br clear="none">> ><br clear="none">> > node1: 192.168.0.41<br clear="none">> ><br clear="none">> > node3: 192.168.0.80<br clear="none">> ><br clear="none">> > volume info:<br clear="none">> ><br clear="none">> > Volume Name: gv0<br clear="none">> > Type: Distributed-Replicate<br clear="none">> > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559<br clear="none">> > Status: Started<br clear="none">> > Snapshot Count: 0<br clear="none">> > Number of Bricks: 6 x (2 + 1) = 18<br clear="none">> > Transport-type: tcp<br clear="none">> > Bricks:<br clear="none">> > Brick1: 192.168.0.40:/var/bricks/0/brick<br clear="none">> > Brick2: 192.168.0.41:/var/bricks/0/brick<br clear="none">> > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)<br clear="none">> > Brick4: 192.168.0.40:/var/bricks/2/brick<br clear="none">> > Brick5: 192.168.0.80:/var/bricks/2/brick<br clear="none">> > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)<br clear="none">> > Brick7: 192.168.0.40:/var/bricks/1/brick<br clear="none">> > Brick8: 192.168.0.41:/var/bricks/1/brick<br clear="none">> > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)<br clear="none">> > Brick10: 192.168.0.40:/var/bricks/3/brick<br clear="none">> > Brick11: 192.168.0.80:/var/bricks/3/brick<br clear="none">> > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)<br clear="none">> > Brick13: 192.168.0.41:/var/bricks/3/brick<br clear="none">> > Brick14: 192.168.0.80:/var/bricks/4/brick<br clear="none">> > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)<br clear="none">> > Brick16: 192.168.0.41:/var/bricks/2/brick<br clear="none">> > Brick17: 192.168.0.80:/var/bricks/5/brick<br clear="none">> > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)<br clear="none">> > Options Reconfigured:<br clear="none">> > cluster.min-free-inodes: 6%<br clear="none">> > cluster.min-free-disk: 2%<br clear="none">> > performance.md-cache-timeout: 600<br clear="none">> > cluster.rebal-throttle: lazy<br clear="none">> > features.scrub-freq: monthly<br clear="none">> > features.scrub-throttle: lazy<br clear="none">> > features.scrub: Inactive<br clear="none">> > features.bitrot: off<br clear="none">> > cluster.server-quorum-type: none<br clear="none">> > performance.cache-refresh-timeout: 10<br clear="none">> > performance.cache-max-file-size: 64MB<br clear="none">> > performance.cache-size: 781901824<br clear="none">> > auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)<br clear="none">> > performance.cache-invalidation: on<br clear="none">> > performance.stat-prefetch: on<br clear="none">> > features.cache-invalidation-timeout: 600<br clear="none">> > cluster.quorum-type: auto<br clear="none">> > features.cache-invalidation: on<br clear="none">> > nfs.disable: on<br clear="none">> > transport.address-family: inet<br clear="none">> > cluster.self-heal-daemon: on<br clear="none">> > cluster.server-quorum-ratio: 51%<br clear="none">> ><br clear="none">> > volume status:<br clear="none">> ><br clear="none">> > Status of volume: gv0<br clear="none">> > Gluster process TCP Port RDMA Port Online Pid<br clear="none">> > ------------------------------------------------------------------------------<br clear="none">> > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066<br clear="none">> > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082<br clear="none">> > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186<br clear="none">> > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075<br clear="none">> > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325<br clear="none">> > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903<br clear="none">> > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084<br clear="none">> > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104<br clear="none">> > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314<br clear="none">> > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692<br clear="none">> > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269<br clear="none">> > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942<br clear="none">> > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058<br clear="none">> > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433<br clear="none">> > Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115<br clear="none">> > Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602<br clear="none">> > Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522<br clear="none">> > Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159<br clear="none">> > Self-heal Daemon on localhost N/A N/A Y 26199<br clear="none">> > Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635<br clear="none">> > Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810<br clear="none">> ><br clear="none">> > Task Status of Volume gv0<br clear="none">> > ------------------------------------------------------------------------------<br clear="none">> > There are no active volume tasks<br clear="none">> ><br clear="none">> > volume heal info summary:<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 1006<br clear="none">> > Number of entries in heal pending: 1006<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/0/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 1006<br clear="none">> > Number of entries in heal pending: 1006<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/arb_0/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/2/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/2/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/arb_1/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/1/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 1006<br clear="none">> > Number of entries in heal pending: 1006<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/1/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 1006<br clear="none">> > Number of entries in heal pending: 1006<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/arb_1/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/3/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/3/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/arb_0/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/3/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/4/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/arb_0/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.41:/var/bricks/2/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.80:/var/bricks/5/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > Brick 192.168.0.40:/var/bricks/arb_1/brick<br clear="none">> > Status: Connected<br clear="none">> > Total Number of entries: 0<br clear="none">> > Number of entries in heal pending: 0<br clear="none">> > Number of entries in split-brain: 0<br clear="none">> > Number of entries possibly healing: 0<br clear="none">> ><br clear="none">> > client-list:<br clear="none">> ><br clear="none">> > Client connections for volume gv0<br clear="none">> > Name count<br clear="none">> > ----- ------<br clear="none">> > fuse 5<br clear="none">> > gfapi.ganesha.nfsd 3<br clear="none">> > glustershd 3<br clear="none">> ><br clear="none">> > total clients for volume gv0 : 11<br clear="none">> > -----------------------------------------------------------------<br clear="none">> ><br clear="none">> > all clients: <a shape="rect" href="https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG" target="_blank">https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG</a><br clear="none">> ><br clear="none">> > failing mnt.log <a shape="rect" href="https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe" target="_blank">https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe</a><br clear="none">> ><br clear="none">> > Thank you.<br clear="none">> ><br clear="none">> > A.<br clear="none">> ><br clear="none">> > "Strahil Nikolov" <a shape="rect" ymailto="mailto:hunter86_bg@yahoo.com" href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a> – 31. Mai 2021 05:23<br clear="none">> > > Can you provide gluster volume info , gluster volume status and gluster volume heal info summary and most probably gluster volume status all clients/client-list<br clear="none">> > ><br clear="none">> > ><br clear="none">> > > Best Regards,<br clear="none">> > > Strahil Nikolov<br clear="none">> > ><br clear="none">> > > > On Sun, May 30, 2021 at 15:17, <a shape="rect" ymailto="mailto:a.schwibbe@gmx.net" href="mailto:a.schwibbe@gmx.net">a.schwibbe@gmx.net</a><br clear="none">> > > > wrote:<br clear="none">> > > ><br clear="none">> > > > I am seeking help here after looking for solutions on the web for my distributed-replicated volume.<br clear="none">> > > ><br clear="none">> > > > My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it.<br clear="none">> > > ><br clear="none">> > > > Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection.<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good.<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Version: 7.9<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Number of Bricks: 6 x (2 + 1) = 18<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > cluster.max-op-version: 70200<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Peers: 3 (node[0..2])<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Layout<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > |node0 |node1 |node2<br clear="none">> > > ><br clear="none">> > > > |brick0 |brick0 |arbit0<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > |arbit1 |brick1 |brick1<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > ....<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > I then recognized that arbiter volumes on node0 & node1 have been healed successfully.<br clear="none">> > > ><br clear="none">> > > > Unfortunately all arbiter volumes on node2 have not been healed!<br clear="none">> > > ><br clear="none">> > > > I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty.<br clear="none">> > > ><br clear="none">> > > > At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not.<br clear="none">> > > ><br clear="none">> > > > I hoped a rebalance fix-layout would fix things. It did not.<br clear="none">> > > ><br clear="none">> > > > I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not.<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Active mount points via nfs-ganesha or fuse continue to work.<br clear="none">> > > ><br clear="none">> > > > Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work.<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > New clients are not able to fuse mount the volume for "authentication error".<br clear="none">> > > ><br clear="none">> > > > heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that.<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Any help/recommendation for you highly appreciated.<br clear="none">> > > ><br clear="none">> > > > Thank you!<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > A.<br clear="none">> > > ><br clear="none">> > > > ________<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Community Meeting Calendar:<br clear="none">> > > ><br clear="none">> > > ><br clear="none">> > > > Schedule -<br clear="none">> > > ><br clear="none">> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">> > > ><br clear="none">> > > > Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">> > > ><br clear="none">> > > > Gluster-users mailing list<br clear="none">> > > ><br clear="none">> > > > <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">> > > ><br clear="none">> > > > <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">> > > ><br clear="none">> > > ><br clear="none">> ><br clear="none">> ><br clear="none"></div> </div> </blockquote></div>