Theoretically it might help.<div>If possible, try to resolve any pending heals.</div><div><br></div><div>Best Regards,</div><div>Strahil Nikolov <br><div><div> <br> <blockquote style="margin: 0 0 20px 0;"> <div style="font-family:Roboto, sans-serif; color:#6D00F6;"> <div>On Thu, Mar 16, 2023 at 15:29, Diego Zuccato</div><div><diego.zuccato@unibo.it> wrote:</div> </div> <div style="padding: 10px 0 0 20px; margin: 10px 0 0 0; border-left: 1px solid #6D00F6;"> In Debian stopping glusterd does not stop brick processes: to stop <br clear="none">everything (and free the memory) I have to<br clear="none">systemctl stop glusterd<br clear="none">   killall glusterfs{,d}<br clear="none">   killall glfsheal<br clear="none">   systemctl start glusterd<br clear="none">[this behaviour hangs a simple reboot of a machine running glusterd... <br clear="none">not nice]<br clear="none"><br clear="none">For now I just restarted glusterd w/o killing the bricks:<br clear="none"><br clear="none">root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart <br clear="none">glusterd ; ps aux|grep glfsheal|wc -l<br clear="none">618<br clear="none">618<br clear="none"><br clear="none">No change neither in glfsheal processes nor in free memory :(<br clear="none">Should I "killall glfsheal" before OOK kicks in?<br clear="none"><br clear="none">Diego<br clear="none"><br clear="none">Il 16/03/2023 12:37, Strahil Nikolov ha scritto:<br clear="none">> Can you restart glusterd service (first check that it was not modified <br clear="none">> to kill the bricks)?<br clear="none">> <br clear="none">> Best Regards,<br clear="none">> Strahil Nikolov<br clear="none">> <br clear="none">>     On Thu, Mar 16, 2023 at 8:26, Diego Zuccato<br clear="none">>     <<a shape="rect" ymailto="mailto:diego.zuccato@unibo.it" href="mailto:diego.zuccato@unibo.it">diego.zuccato@unibo.it</a>> wrote:<br clear="none">>     OOM is just just a matter of time.<br clear="none">> <br clear="none">>     Today mem use is up to 177G/187 and:<br clear="none">>     # ps aux|grep glfsheal|wc -l<br clear="none">>     551<br clear="none">> <br clear="none">>     (well, one is actually the grep process, so "only" 550 glfsheal<br clear="none">>     processes.<br clear="none">> <br clear="none">>     I'll take the last 5:<br clear="none">>     root    3266352  0.5  0.0 600292 93044 ?        Sl  06:55  0:07<br clear="none">>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml<br clear="none">>     root    3267220  0.7  0.0 600292 91964 ?        Sl  07:00  0:07<br clear="none">>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml<br clear="none">>     root    3268076  1.0  0.0 600160 88216 ?        Sl  07:05  0:08<br clear="none">>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml<br clear="none">>     root    3269492  1.6  0.0 600292 91248 ?        Sl  07:10  0:07<br clear="none">>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml<br clear="none">>     root    3270354  4.4  0.0 600292 93260 ?        Sl  07:15  0:07<br clear="none">>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml<br clear="none">> <br clear="none">>     -8<--<br clear="none">>     root@str957-clustor00:~# ps -o ppid= 3266352<br clear="none">>     3266345<br clear="none">>     root@str957-clustor00:~# ps -o ppid= 3267220<br clear="none">>     3267213<br clear="none">>     root@str957-clustor00:~# ps -o ppid= 3268076<br clear="none">>     3268069<br clear="none">>     root@str957-clustor00:~# ps -o ppid= 3269492<br clear="none">>     3269485<br clear="none">>     root@str957-clustor00:~# ps -o ppid= 3270354<br clear="none">>     3270347<br clear="none">>     root@str957-clustor00:~# ps aux|grep 3266345<br clear="none">>     root    3266345  0.0  0.0 430536 10764 ?        Sl  06:55  0:00<br clear="none">>     gluster volume heal cluster_data info summary --xml<br clear="none">>     root    3271532  0.0  0.0  6260  2500 pts/1    S+  07:21  0:00 grep<br clear="none">>     3266345<br clear="none">>     root@str957-clustor00:~# ps aux|grep 3267213<br clear="none">>     root    3267213  0.0  0.0 430536 10644 ?        Sl  07:00  0:00<br clear="none">>     gluster volume heal cluster_data info summary --xml<br clear="none">>     root    3271599  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00 grep<br clear="none">>     3267213<br clear="none">>     root@str957-clustor00:~# ps aux|grep 3268069<br clear="none">>     root    3268069  0.0  0.0 430536 10704 ?        Sl  07:05  0:00<br clear="none">>     gluster volume heal cluster_data info summary --xml<br clear="none">>     root    3271626  0.0  0.0  6260  2516 pts/1    S+  07:22  0:00 grep<br clear="none">>     3268069<br clear="none">>     root@str957-clustor00:~# ps aux|grep 3269485<br clear="none">>     root    3269485  0.0  0.0 430536 10756 ?        Sl  07:10  0:00<br clear="none">>     gluster volume heal cluster_data info summary --xml<br clear="none">>     root    3271647  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00 grep<br clear="none">>     3269485<br clear="none">>     root@str957-clustor00:~# ps aux|grep 3270347<br clear="none">>     root    3270347  0.0  0.0 430536 10672 ?        Sl  07:15  0:00<br clear="none">>     gluster volume heal cluster_data info summary --xml<br clear="none">>     root    3271666  0.0  0.0  6260  2568 pts/1    S+  07:22  0:00 grep<br clear="none">>     3270347<br clear="none">>     -8<--<br clear="none">> <br clear="none">>     Seems glfsheal is spawning more processes.<br clear="none">>     I can't rule out a metadata corruption (or at least a desync), but it<br clear="none">>     shouldn't happen...<br clear="none">> <br clear="none">>     Diego<br clear="none">> <br clear="none">>     Il 15/03/2023 20:11, Strahil Nikolov ha scritto:<br clear="none">>      > If you don't experience any OOM , you can focus on the heals.<br clear="none">>      ><br clear="none">>      > 284 processes of glfsheal seems odd.<br clear="none">>      ><br clear="none">>      > Can you check the ppid for 2-3 randomly picked ?<br clear="none">>      > ps -o ppid= <pid><br clear="none">>      ><br clear="none">>      > Best Regards,<br clear="none">>      > Strahil Nikolov<br clear="none">>      ><br clear="none">>      >    On Wed, Mar 15, 2023 at 9:54, Diego Zuccato<br clear="none">>      >    <<a shape="rect" ymailto="mailto:diego.zuccato@unibo.it" href="mailto:diego.zuccato@unibo.it">diego.zuccato@unibo.it</a> <mailto:diego.zuccato@unibo.it>> wrote:<br clear="none">>      >    I enabled it yesterday and that greatly reduced memory pressure.<br clear="none">>      >    Current volume info:<br clear="none">>      >    -8<--<br clear="none">>      >    Volume Name: cluster_data<br clear="none">>      >    Type: Distributed-Replicate<br clear="none">>      >    Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a<br clear="none">>      >    Status: Started<br clear="none">>      >    Snapshot Count: 0<br clear="none">>      >    Number of Bricks: 45 x (2 + 1) = 135<br clear="none">>      >    Transport-type: tcp<br clear="none">>      >    Bricks:<br clear="none">>      >    Brick1: clustor00:/srv/bricks/00/d<br clear="none">>      >    Brick2: clustor01:/srv/bricks/00/d<br clear="none">>      >    Brick3: clustor02:/srv/bricks/00/q (arbiter)<br clear="none">>      >    [...]<br clear="none">>      >    Brick133: clustor01:/srv/bricks/29/d<br clear="none">>      >    Brick134: clustor02:/srv/bricks/29/d<br clear="none">>      >    Brick135: clustor00:/srv/bricks/14/q (arbiter)<br clear="none">>      >    Options Reconfigured:<br clear="none">>      >    performance.quick-read: off<br clear="none">>      >    cluster.entry-self-heal: on<br clear="none">>      >    cluster.data-self-heal-algorithm: full<br clear="none">>      >    cluster.metadata-self-heal: on<br clear="none">>      >    cluster.shd-max-threads: 2<br clear="none">>      >    network.inode-lru-limit: 500000<br clear="none">>      >    performance.md-cache-timeout: 600<br clear="none">>      >    performance.cache-invalidation: on<br clear="none">>      >    features.cache-invalidation-timeout: 600<br clear="none">>      >    features.cache-invalidation: on<br clear="none">>      >    features.quota-deem-statfs: on<br clear="none">>      >    performance.readdir-ahead: on<br clear="none">>      >    cluster.granular-entry-heal: enable<br clear="none">>      >    features.scrub: Active<br clear="none">>      >    features.bitrot: on<br clear="none">>      >    cluster.lookup-optimize: on<br clear="none">>      >    performance.stat-prefetch: on<br clear="none">>      >    performance.cache-refresh-timeout: 60<br clear="none">>      >    performance.parallel-readdir: on<br clear="none">>      >    performance.write-behind-window-size: 128MB<br clear="none">>      >    cluster.self-heal-daemon: enable<br clear="none">>      >    features.inode-quota: on<br clear="none">>      >    features.quota: on<br clear="none">>      >    transport.address-family: inet<br clear="none">>      >    nfs.disable: on<br clear="none">>      >    performance.client-io-threads: off<br clear="none">>      >    client.event-threads: 1<br clear="none">>      >    features.scrub-throttle: normal<br clear="none">>      >    diagnostics.brick-log-level: ERROR<br clear="none">>      >    diagnostics.client-log-level: ERROR<br clear="none">>      >    config.brick-threads: 0<br clear="none">>      >    cluster.lookup-unhashed: on<br clear="none">>      >    config.client-threads: 1<br clear="none">>      >    cluster.use-anonymous-inode: off<br clear="none">>      >    diagnostics.brick-sys-log-level: CRITICAL<br clear="none">>      >    features.scrub-freq: monthly<br clear="none">>      >    cluster.data-self-heal: on<br clear="none">>      >    cluster.brick-multiplex: on<br clear="none">>      >    cluster.daemon-log-level: ERROR<br clear="none">>      >    -8<--<br clear="none">>      ><br clear="none">>      >    htop reports that memory usage is up to 143G, there are 602<br clear="none">>     tasks and<br clear="none">>      >    5232 threads (~20 running) on clustor00, 117G/49 tasks/1565<br clear="none">>     threads on<br clear="none">>      >    clustor01 and 126G/45 tasks/1574 threads on clustor02.<br clear="none">>      >    I see quite a lot (284!) of glfsheal processes running on<br clear="none">>     clustor00 (a<br clear="none">>      >    "gluster v heal cluster_data info summary" is running on clustor02<br clear="none">>      >    since<br clear="none">>      >    yesterday, still no output). Shouldn't be just one per brick?<br clear="none">>      ><br clear="none">>      >    Diego<br clear="none">>      ><br clear="none">>      >    Il 15/03/2023 08:30, Strahil Nikolov ha scritto:<br clear="none">>      >      > Do you use brick multiplexing ?<br clear="none">>      >      ><br clear="none">>      >      > Best Regards,<br clear="none">>      >      > Strahil Nikolov<br clear="none">>      >      ><br clear="none">>      >      >    On Tue, Mar 14, 2023 at 16:44, Diego Zuccato<br clear="none">>      >      >    <<a shape="rect" ymailto="mailto:diego.zuccato@unibo.it" href="mailto:diego.zuccato@unibo.it">diego.zuccato@unibo.it</a> <mailto:diego.zuccato@unibo.it><br clear="none">>     <mailto:diego.zuccato@unibo.it>> wrote:<br clear="none">>      >      >    Hello all.<br clear="none">>      >      ><br clear="none">>      >      >    Our Gluster 9.6 cluster is showing increasing problems.<br clear="none">>      >      >    Currently it's composed of 3 servers (2x Intel Xeon<br clear="none">>     4210 [20<br clear="none">>      >    cores dual<br clear="none">>      >      >    thread, total 40 threads], 192GB RAM, 30x HGST<br clear="none">>     HUH721212AL5200<br clear="none">>      >    [12TB]),<br clear="none">>      >      >    configured in replica 3 arbiter 1. Using Debian<br clear="none">>     packages from<br clear="none">>      >    Gluster<br clear="none">>      >      >    9.x latest repository.<br clear="none">>      >      ><br clear="none">>      >      >    Seems 192G RAM are not enough to handle 30 data bricks + 15<br clear="none">>      >    arbiters<br clear="none">>      >      >    and<br clear="none">>      >      >    I often had to reload glusterfsd because glusterfs<br clear="none">>     processed<br clear="none">>      >    got killed<br clear="none">>      >      >    for OOM.<br clear="none">>      >      >    On top of that, performance have been quite bad, especially<br clear="none">>      >    when we<br clear="none">>      >      >    reached about 20M files. On top of that, one of the servers<br clear="none">>      >    have had<br clear="none">>      >      >    mobo issues that resulted in memory errors that<br clear="none">>     corrupted some<br clear="none">>      >      >    bricks fs<br clear="none">>      >      >    (XFS, it required "xfs_reparir -L" to fix).<br clear="none">>      >      >    Now I'm getting lots of "stale file handle" errors and<br clear="none">>     other<br clear="none">>      >    errors<br clear="none">>      >      >    (like directories that seem empty from the client but still<br clear="none">>      >    containing<br clear="none">>      >      >    files in some bricks) and auto healing seems unable to<br clear="none">>     complete.<br clear="none">>      >      ><br clear="none">>      >      >    Since I can't keep up continuing to manually fix all the<br clear="none">>      >    issues, I'm<br clear="none">>      >      >    thinking about backup+destroy+recreate strategy.<br clear="none">>      >      ><br clear="none">>      >      >    I think that if I reduce the number of bricks per<br clear="none">>     server to just 5<br clear="none">>      >      >    (RAID1 of 6x12TB disks) I might resolve RAM issues - at the<br clear="none">>      >    cost of<br clear="none">>      >      >    longer heal times in case a disk fails. Am I right or it's<br clear="none">>      >    useless?<br clear="none">>      >      >    Other recommendations?<br clear="none">>      >      >    Servers have space for another 6 disks. Maybe those<br clear="none">>     could be<br clear="none">>      >    used for<br clear="none">>      >      >    some SSDs to speed up access?<br clear="none">>      >      ><br clear="none">>      >      >    TIA.<br clear="none">>      >      ><br clear="none">>      >      >    --<br clear="none">>      >      >    Diego Zuccato<br clear="none">>      >      >    DIFA - Dip. di Fisica e Astronomia<br clear="none">>      >      >    Servizi Informatici<br clear="none">>      >      >    Alma Mater Studiorum - Università di Bologna<br clear="none">>      >      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br clear="none">>      >      >    tel.: +39 051 20 95786<br clear="none">>      >      >    ________<br clear="none">>      >      ><br clear="none">>      >      ><br clear="none">>      >      ><br clear="none">>      >      >    Community Meeting Calendar:<br clear="none">>      >      ><br clear="none">>      >      >    Schedule -<br clear="none">>      >      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>      >      >    Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>><br clear="none">>      >    <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>>><br clear="none">>      >      >    <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>><br clear="none">>      >    <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>>>><br clear="none">>      >      >    Gluster-users mailing list<br clear="none">>      >      > <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br clear="none">>     <mailto:Gluster-users@gluster.org> <mailto:Gluster-users@gluster.org><br clear="none">>      >    <mailto:Gluster-users@gluster.org><br clear="none">>      >      > <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>><br clear="none">>      >    <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>>><br clear="none">>      >      >   <br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>><br clear="none">>      >    <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>>>><div class="yqt9883515263" id="yqtfd70553"><br clear="none">> <br clear="none">>      ><br clear="none">>      >      ><br clear="none">>      ><br clear="none">>      >    --<br clear="none">>      >    Diego Zuccato<br clear="none">>      >    DIFA - Dip. di Fisica e Astronomia<br clear="none">>      >    Servizi Informatici<br clear="none">>      >    Alma Mater Studiorum - Università di Bologna<br clear="none">>      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br clear="none">>      >    tel.: +39 051 20 95786<br clear="none">>      >    ________<br clear="none">>      ><br clear="none">>      ><br clear="none">>      ><br clear="none">>      >    Community Meeting Calendar:<br clear="none">>      ><br clear="none">>      >    Schedule -<br clear="none">>      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br clear="none">>      >    Bridge: <a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>><br clear="none">>      >    <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br clear="none">>     <<a shape="rect" href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>>><br clear="none">>      >    Gluster-users mailing list<br clear="none">>      > <a shape="rect" ymailto="mailto:Gluster-users@gluster.org" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> <mailto:Gluster-users@gluster.org><br clear="none">>     <mailto:Gluster-users@gluster.org><br clear="none">>      > <a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>><br clear="none">>      >    <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br clear="none">>     <<a shape="rect" href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>>><br clear="none">>      ><br clear="none">> <br clear="none">>     -- <br clear="none">>     Diego Zuccato<br clear="none">>     DIFA - Dip. di Fisica e Astronomia<br clear="none">>     Servizi Informatici<br clear="none">>     Alma Mater Studiorum - Università di Bologna<br clear="none">>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br clear="none">>     tel.: +39 051 20 95786<br clear="none">> <br clear="none"><br clear="none">-- <br clear="none">Diego Zuccato<br clear="none">DIFA - Dip. di Fisica e Astronomia<br clear="none">Servizi Informatici<br clear="none">Alma Mater Studiorum - Università di Bologna<br clear="none">V.le Berti-Pichat 6/2 - 40127 Bologna - Italy<br clear="none">tel.: +39 051 20 95786<br clear="none"></div> </div> </blockquote></div></div></div>