[Gluster-users] How to configure?
hunter86_bg at yahoo.com
Wed Mar 15 19:11:02 UTC 2023
If you don't experience any OOM , you can focus on the heals.
284 processes of glfsheal seems odd.
Can you check the ppid for 2-3 randomly picked ?ps -o ppid= <pid>
Best Regards,Strahil Nikolov
On Wed, Mar 15, 2023 at 9:54, Diego Zuccato<diego.zuccato at unibo.it> wrote: I enabled it yesterday and that greatly reduced memory pressure.
Current volume info:
Volume Name: cluster_data
Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
Snapshot Count: 0
Number of Bricks: 45 x (2 + 1) = 135
Brick3: clustor02:/srv/bricks/00/q (arbiter)
Brick135: clustor00:/srv/bricks/14/q (arbiter)
htop reports that memory usage is up to 143G, there are 602 tasks and
5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on
clustor01 and 126G/45 tasks/1574 threads on clustor02.
I see quite a lot (284!) of glfsheal processes running on clustor00 (a
"gluster v heal cluster_data info summary" is running on clustor02 since
yesterday, still no output). Shouldn't be just one per brick?
Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
> Do you use brick multiplexing ?
> Best Regards,
> Strahil Nikolov
> On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
> <diego.zuccato at unibo.it> wrote:
> Hello all.
> Our Gluster 9.6 cluster is showing increasing problems.
> Currently it's composed of 3 servers (2x Intel Xeon 4210 [20 cores dual
> thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200 [12TB]),
> configured in replica 3 arbiter 1. Using Debian packages from Gluster
> 9.x latest repository.
> Seems 192G RAM are not enough to handle 30 data bricks + 15 arbiters
> I often had to reload glusterfsd because glusterfs processed got killed
> for OOM.
> On top of that, performance have been quite bad, especially when we
> reached about 20M files. On top of that, one of the servers have had
> mobo issues that resulted in memory errors that corrupted some
> bricks fs
> (XFS, it required "xfs_reparir -L" to fix).
> Now I'm getting lots of "stale file handle" errors and other errors
> (like directories that seem empty from the client but still containing
> files in some bricks) and auto healing seems unable to complete.
> Since I can't keep up continuing to manually fix all the issues, I'm
> thinking about backup+destroy+recreate strategy.
> I think that if I reduce the number of bricks per server to just 5
> (RAID1 of 6x12TB disks) I might resolve RAM issues - at the cost of
> longer heal times in case a disk fails. Am I right or it's useless?
> Other recommendations?
> Servers have space for another 6 disks. Maybe those could be used for
> some SSDs to speed up access?
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> Community Meeting Calendar:
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
DIFA - Dip. di Fisica e Astronomia
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Community Meeting Calendar:
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Gluster-users mailing list
Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users