[Gluster-users] How to configure?

Thu Mar 16 13:28:58 UTC 2023

In Debian stopping glusterd does not stop brick processes: to stop 
everything (and free the memory) I have to
systemctl stop glusterd
   killall glusterfs{,d}
   killall glfsheal
   systemctl start glusterd
[this behaviour hangs a simple reboot of a machine running glusterd... 
not nice]

For now I just restarted glusterd w/o killing the bricks:

root at str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart 
glusterd ; ps aux|grep glfsheal|wc -l
618
618

No change neither in glfsheal processes nor in free memory :(
Should I "killall glfsheal" before OOK kicks in?

Diego

Il 16/03/2023 12:37, Strahil Nikolov ha scritto:
> Can you restart glusterd service (first check that it was not modified 
> to kill the bricks)?
> 
> Best Regards,
> Strahil Nikolov
> 
>     On Thu, Mar 16, 2023 at 8:26, Diego Zuccato
>     <diego.zuccato at unibo.it> wrote:
>     OOM is just just a matter of time.
> 
>     Today mem use is up to 177G/187 and:
>     # ps aux|grep glfsheal|wc -l
>     551
> 
>     (well, one is actually the grep process, so "only" 550 glfsheal
>     processes.
> 
>     I'll take the last 5:
>     root    3266352  0.5  0.0 600292 93044 ?        Sl  06:55  0:07
>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>     root    3267220  0.7  0.0 600292 91964 ?        Sl  07:00  0:07
>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>     root    3268076  1.0  0.0 600160 88216 ?        Sl  07:05  0:08
>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>     root    3269492  1.6  0.0 600292 91248 ?        Sl  07:10  0:07
>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>     root    3270354  4.4  0.0 600292 93260 ?        Sl  07:15  0:07
>     /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> 
>     -8<--
>     root at str957-clustor00:~# ps -o ppid= 3266352
>     3266345
>     root at str957-clustor00:~# ps -o ppid= 3267220
>     3267213
>     root at str957-clustor00:~# ps -o ppid= 3268076
>     3268069
>     root at str957-clustor00:~# ps -o ppid= 3269492
>     3269485
>     root at str957-clustor00:~# ps -o ppid= 3270354
>     3270347
>     root at str957-clustor00:~# ps aux|grep 3266345
>     root    3266345  0.0  0.0 430536 10764 ?        Sl  06:55  0:00
>     gluster volume heal cluster_data info summary --xml
>     root    3271532  0.0  0.0  6260  2500 pts/1    S+  07:21  0:00 grep
>     3266345
>     root at str957-clustor00:~# ps aux|grep 3267213
>     root    3267213  0.0  0.0 430536 10644 ?        Sl  07:00  0:00
>     gluster volume heal cluster_data info summary --xml
>     root    3271599  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00 grep
>     3267213
>     root at str957-clustor00:~# ps aux|grep 3268069
>     root    3268069  0.0  0.0 430536 10704 ?        Sl  07:05  0:00
>     gluster volume heal cluster_data info summary --xml
>     root    3271626  0.0  0.0  6260  2516 pts/1    S+  07:22  0:00 grep
>     3268069
>     root at str957-clustor00:~# ps aux|grep 3269485
>     root    3269485  0.0  0.0 430536 10756 ?        Sl  07:10  0:00
>     gluster volume heal cluster_data info summary --xml
>     root    3271647  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00 grep
>     3269485
>     root at str957-clustor00:~# ps aux|grep 3270347
>     root    3270347  0.0  0.0 430536 10672 ?        Sl  07:15  0:00
>     gluster volume heal cluster_data info summary --xml
>     root    3271666  0.0  0.0  6260  2568 pts/1    S+  07:22  0:00 grep
>     3270347
>     -8<--
> 
>     Seems glfsheal is spawning more processes.
>     I can't rule out a metadata corruption (or at least a desync), but it
>     shouldn't happen...
> 
>     Diego
> 
>     Il 15/03/2023 20:11, Strahil Nikolov ha scritto:
>      > If you don't experience any OOM , you can focus on the heals.
>      >
>      > 284 processes of glfsheal seems odd.
>      >
>      > Can you check the ppid for 2-3 randomly picked ?
>      > ps -o ppid= <pid>
>      >
>      > Best Regards,
>      > Strahil Nikolov
>      >
>      >    On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
>      >    <diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>> wrote:
>      >    I enabled it yesterday and that greatly reduced memory pressure.
>      >    Current volume info:
>      >    -8<--
>      >    Volume Name: cluster_data
>      >    Type: Distributed-Replicate
>      >    Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
>      >    Status: Started
>      >    Snapshot Count: 0
>      >    Number of Bricks: 45 x (2 + 1) = 135
>      >    Transport-type: tcp
>      >    Bricks:
>      >    Brick1: clustor00:/srv/bricks/00/d
>      >    Brick2: clustor01:/srv/bricks/00/d
>      >    Brick3: clustor02:/srv/bricks/00/q (arbiter)
>      >    [...]
>      >    Brick133: clustor01:/srv/bricks/29/d
>      >    Brick134: clustor02:/srv/bricks/29/d
>      >    Brick135: clustor00:/srv/bricks/14/q (arbiter)
>      >    Options Reconfigured:
>      >    performance.quick-read: off
>      >    cluster.entry-self-heal: on
>      >    cluster.data-self-heal-algorithm: full
>      >    cluster.metadata-self-heal: on
>      >    cluster.shd-max-threads: 2
>      >    network.inode-lru-limit: 500000
>      >    performance.md-cache-timeout: 600
>      >    performance.cache-invalidation: on
>      >    features.cache-invalidation-timeout: 600
>      >    features.cache-invalidation: on
>      >    features.quota-deem-statfs: on
>      >    performance.readdir-ahead: on
>      >    cluster.granular-entry-heal: enable
>      >    features.scrub: Active
>      >    features.bitrot: on
>      >    cluster.lookup-optimize: on
>      >    performance.stat-prefetch: on
>      >    performance.cache-refresh-timeout: 60
>      >    performance.parallel-readdir: on
>      >    performance.write-behind-window-size: 128MB
>      >    cluster.self-heal-daemon: enable
>      >    features.inode-quota: on
>      >    features.quota: on
>      >    transport.address-family: inet
>      >    nfs.disable: on
>      >    performance.client-io-threads: off
>      >    client.event-threads: 1
>      >    features.scrub-throttle: normal
>      >    diagnostics.brick-log-level: ERROR
>      >    diagnostics.client-log-level: ERROR
>      >    config.brick-threads: 0
>      >    cluster.lookup-unhashed: on
>      >    config.client-threads: 1
>      >    cluster.use-anonymous-inode: off
>      >    diagnostics.brick-sys-log-level: CRITICAL
>      >    features.scrub-freq: monthly
>      >    cluster.data-self-heal: on
>      >    cluster.brick-multiplex: on
>      >    cluster.daemon-log-level: ERROR
>      >    -8<--
>      >
>      >    htop reports that memory usage is up to 143G, there are 602
>     tasks and
>      >    5232 threads (~20 running) on clustor00, 117G/49 tasks/1565
>     threads on
>      >    clustor01 and 126G/45 tasks/1574 threads on clustor02.
>      >    I see quite a lot (284!) of glfsheal processes running on
>     clustor00 (a
>      >    "gluster v heal cluster_data info summary" is running on clustor02
>      >    since
>      >    yesterday, still no output). Shouldn't be just one per brick?
>      >
>      >    Diego
>      >
>      >    Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
>      >      > Do you use brick multiplexing ?
>      >      >
>      >      > Best Regards,
>      >      > Strahil Nikolov
>      >      >
>      >      >    On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
>      >      >    <diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>
>     <mailto:diego.zuccato at unibo.it>> wrote:
>      >      >    Hello all.
>      >      >
>      >      >    Our Gluster 9.6 cluster is showing increasing problems.
>      >      >    Currently it's composed of 3 servers (2x Intel Xeon
>     4210 [20
>      >    cores dual
>      >      >    thread, total 40 threads], 192GB RAM, 30x HGST
>     HUH721212AL5200
>      >    [12TB]),
>      >      >    configured in replica 3 arbiter 1. Using Debian
>     packages from
>      >    Gluster
>      >      >    9.x latest repository.
>      >      >
>      >      >    Seems 192G RAM are not enough to handle 30 data bricks + 15
>      >    arbiters
>      >      >    and
>      >      >    I often had to reload glusterfsd because glusterfs
>     processed
>      >    got killed
>      >      >    for OOM.
>      >      >    On top of that, performance have been quite bad, especially
>      >    when we
>      >      >    reached about 20M files. On top of that, one of the servers
>      >    have had
>      >      >    mobo issues that resulted in memory errors that
>     corrupted some
>      >      >    bricks fs
>      >      >    (XFS, it required "xfs_reparir -L" to fix).
>      >      >    Now I'm getting lots of "stale file handle" errors and
>     other
>      >    errors
>      >      >    (like directories that seem empty from the client but still
>      >    containing
>      >      >    files in some bricks) and auto healing seems unable to
>     complete.
>      >      >
>      >      >    Since I can't keep up continuing to manually fix all the
>      >    issues, I'm
>      >      >    thinking about backup+destroy+recreate strategy.
>      >      >
>      >      >    I think that if I reduce the number of bricks per
>     server to just 5
>      >      >    (RAID1 of 6x12TB disks) I might resolve RAM issues - at the
>      >    cost of
>      >      >    longer heal times in case a disk fails. Am I right or it's
>      >    useless?
>      >      >    Other recommendations?
>      >      >    Servers have space for another 6 disks. Maybe those
>     could be
>      >    used for
>      >      >    some SSDs to speed up access?
>      >      >
>      >      >    TIA.
>      >      >
>      >      >    --
>      >      >    Diego Zuccato
>      >      >    DIFA - Dip. di Fisica e Astronomia
>      >      >    Servizi Informatici
>      >      >    Alma Mater Studiorum - Università di Bologna
>      >      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >      >    tel.: +39 051 20 95786
>      >      >    ________
>      >      >
>      >      >
>      >      >
>      >      >    Community Meeting Calendar:
>      >      >
>      >      >    Schedule -
>      >      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      >      >    Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>      >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >      >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>      >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>>
>      >      >    Gluster-users mailing list
>      >      > Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org>
>      >    <mailto:Gluster-users at gluster.org>
>      >      > https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>      >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >      >   
>     <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>      >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>>
> 
>      >
>      >      >
>      >
>      >    --
>      >    Diego Zuccato
>      >    DIFA - Dip. di Fisica e Astronomia
>      >    Servizi Informatici
>      >    Alma Mater Studiorum - Università di Bologna
>      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >    tel.: +39 051 20 95786
>      >    ________
>      >
>      >
>      >
>      >    Community Meeting Calendar:
>      >
>      >    Schedule -
>      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      >    Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>      >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >    Gluster-users mailing list
>      > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org>
>      > https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>      >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >
> 
>     -- 
>     Diego Zuccato
>     DIFA - Dip. di Fisica e Astronomia
>     Servizi Informatici
>     Alma Mater Studiorum - Università di Bologna
>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>     tel.: +39 051 20 95786
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786