[Gluster-users] gluster heal performance (was: Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.)

Fri Sep 11 03:27:12 UTC 2020

Excerpts from Gionatan Danti's message of 2020-09-11 00:35:52 +0200:
> The main point was the potentially long heal time 

could you (or anyone else) please elaborate on what long heal times are
to be expected?

we have a 3-node replica cluster running version 3.12.9 (we are building
a new cluster now) with 32TiB of space. each node has a single brick on
top of a 7-disk raid5 (linux softraid)

at one point we had one node unavailable for one month (gluster failed
to start up properly on that node and we didn't have monitoring in place
to notice) and the accumulated changes of one month of operation took 4
months to heal. i would have expected this ideally to take 2 weeks or
less, one month at the worst (ie faster than or at least as fast as it
took to create the data but not slower, and especially not 4 times
slower)

the initial heal count was about 6million files for one node and
5.4million for the other.

the healing speed was not constant. at first the heal count increased,
that is, healing was seemingly slower than the amount of new files
added. then it started to speed up and the first million of each node
took about 46 days to heal, while the last million took 4 days.

i logged the output of 
   "gluster volume heal gluster-volume statistics heal-count" 
every hour to monitor the healing process.

what makes healing so slow?

almost all files are newly added and not changed, so they were missing
on the node that was offline. the files are backup for user devices, so
almost all files are written once and rarely, if ever, read.

we do have a few huge directories with 250000, 88000, 60000 and 29000
subdirectories each. in total 26TiB of small files, but no more than
a few 1000 per directory. (it's user data, some have more, some have
less)

could those huge directories be responsible for the slow healing? 

the filesystem is ext4 on top of a 7 disk raid5.

after this ordeal was over we discovered the readdir-ahead setting which
was on.  we turned that off based on other discussions on performance
that suggested an improvement from this change, but we haven't had the
opportunity to do a large healing test since, so we can't tell if it
makes a difference for us.

any insights would be appreciated.

greetings, martin.

--
general manager                                                    realss.com
student mentor                                                   fossasia.org
community mentor     blug.sh                                  beijinglug.club
pike programmer      pike.lysator.liu.se    caudium.net     societyserver.org
Martin Bähr          working in china        http://societyserver.org/mbaehr/