[Gluster-users] Performance: lots of small files, hdd, nvme etc.
revirii at googlemail.com
Mon May 1 07:28:55 UTC 2023
well... i know that you can read from the bricks themselves, but when
there are 7 bricks with each 1/7 of the data - which one do you
choose? ;-) Maybe ONE raid1 or raid10 and a replicate 3 performs
better than a "Number of Bricks: 5 x 3 = 15" Distributed-Replicate...
systems are under heavy load. Did some reading regarding tuning, most
of the stuff for small-file-scenario already done, did some more (just
by guessing, as documentation is a bit... poor):
Well, doesn't really help. Here are some images from one server (doing
some reads, 100K for each server per day) and one client (doing
frequent operations on the volume):
disk util: https://abload.de/img/server-diskstats_utili2isl.png
Well, i hope you people can see why i keep asking such stuff
frequently. 4 options:
1) find some good tuning
2) check if a raid10 (with 10-14 huge hdds) performs better
3) migrate to nvme (JBOD or raid10)
4) or, if none of the above is feasible or reasonable, migrate to a
different solution (like ceph, minio, ...)
Thx for reading && best regards,
Am Mo., 3. Apr. 2023 um 19:10 Uhr schrieb <gluster-users at jahu.sk>:
> you can read files from underlying filesystem first (ext4,xfs...), for
> ex: /srv/glusterfs/wwww/brick.
> as fall back you can check mounted glusterfs path, to heal missing local
> node entries. ex: /mnt/shared/www/...
> you need only to write to mount.glusterfs mount point.
> On 3/30/2023 11:26 AM, Hu Bert wrote:
> > - workload: the (un)famous "lots of small files" setting
> > - currently 70% of the of the volume is used: ~32TB
> > - file size: few KB up to 1MB
> > - so there are hundreds of millions of files (and millions of directories)
> > - each image has an ID
> > - under the base dir the IDs are split into 3 digits
> > - dir structure: /basedir/(000-999)/(000-999)/ID/[lotsoffileshere]
> > - example for ID 123456789: /basedir/123/456/123456789/default.jpg
> > - maybe this structure isn't good and e.g. this would be better:
> > /basedir/IDs/[here the files] - so millions of ID-dirs directly under
> > /basedir/
> > - frequent access to the files by webservers (nginx, tomcat): lookup
> > if file exists, read/write images etc.
> > - Strahil mentioned: "Keep in mind that negative searches (searches of
> > non-existing/deleted objects) has highest penalty." <--- that happens
> > very often...
> > - server load on high traffic days: > 100 (mostly iowait)
> > - bad are server reboots (read filesystem info etc.)
> > - really bad is a sw raid rebuild/resync
> S pozdravom / Yours sincerely
> Ing. Jan Hudoba
> Community Meeting Calendar:
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
More information about the Gluster-users