[Gluster-users] Performance: lots of small files, hdd, nvme etc.

Hu Bert revirii at googlemail.com
Mon May 1 07:28:55 UTC 2023


Hi there,
well... i know that you can read from the bricks themselves, but when
there are 7 bricks with each 1/7 of the data - which one do you
choose? ;-) Maybe ONE raid1 or raid10 and a replicate 3 performs
better than a "Number of Bricks: 5 x 3 = 15" Distributed-Replicate...

systems are under heavy load. Did some reading regarding tuning, most
of the stuff for small-file-scenario already done, did some more (just
by guessing, as documentation is a bit... poor):

performance.io-cache on
performance.io-cache-size 6GB
performance.quick-read-cache-size 6GB
group nl-cache
network.inode-lru-limit 400000

Well, doesn't really help. Here are some images from one server (doing
some reads, 100K for each server per day) and one client (doing
frequent operations on the volume):

server:
cpu: https://abload.de/img/server-cpub3cf8.png
diskstats: https://abload.de/img/server-diskstats_iopsbleui.png
throughput: https://abload.de/img/server-diskstats_thro0kfma.png
disk util: https://abload.de/img/server-diskstats_utili2isl.png
network: https://abload.de/img/server-if_ens5-dayeac51.png
interrupts: https://abload.de/img/server-interruptsxycwd.png
load: https://abload.de/img/server-loadp7cid.png

client:
cpu: https://abload.de/img/client-cpuiuc0h.png
load: https://abload.de/img/client-loadsadod.png

Well, i hope you people can see why i keep asking such stuff
frequently. 4 options:

1) find some good tuning
2) check if a raid10 (with 10-14 huge hdds) performs better
3) migrate to nvme (JBOD or raid10)
4) or, if none of the above is feasible or reasonable, migrate to a
different solution (like ceph, minio, ...)


Thx for reading && best regards,

Hubert

Am Mo., 3. Apr. 2023 um 19:10 Uhr schrieb <gluster-users at jahu.sk>:
>
> hello
> you can read files from underlying filesystem first (ext4,xfs...), for
> ex: /srv/glusterfs/wwww/brick.
>
> as fall back you can check mounted glusterfs path, to heal missing local
> node entries. ex: /mnt/shared/www/...
>
> you need only to write to mount.glusterfs mount point.
>
>
>
>
>
> On 3/30/2023 11:26 AM, Hu Bert wrote:
> > - workload: the (un)famous "lots of small files" setting
> > - currently 70% of the of the volume is used: ~32TB
> > - file size: few KB up to 1MB
> > - so there are hundreds of millions of files (and millions of directories)
> > - each image has an ID
> > - under the base dir the IDs are split into 3 digits
> > - dir structure: /basedir/(000-999)/(000-999)/ID/[lotsoffileshere]
> > - example for ID 123456789: /basedir/123/456/123456789/default.jpg
> > - maybe this structure isn't good and e.g. this would be better:
> > /basedir/IDs/[here the files] - so millions of ID-dirs directly under
> > /basedir/
> > - frequent access to the files by webservers (nginx, tomcat): lookup
> > if file exists, read/write images etc.
> > - Strahil mentioned: "Keep in mind that negative searches (searches of
> > non-existing/deleted objects) has highest penalty." <--- that happens
> > very often...
> > - server load on high traffic days: > 100 (mostly iowait)
> > - bad are server reboots (read filesystem info etc.)
> > - really bad is a sw raid rebuild/resync
>
>
> --
> S pozdravom / Yours sincerely
> Ing. Jan Hudoba
>
> http://www.jahu.sk
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


More information about the Gluster-users mailing list