[Gluster-users] Replication logic

Sun Jan 3 03:48:35 UTC 2021

>> Just take the slow brick offline during the initial sync 
>> and then bring it online. The heal will go in background, 
>> while the volume stays operational.

> Yes, but the heal will then take three weeks. 

I meant this as an obvious exaggeration, but it seems it was
not.

I removed the arbiter and created a new full brick. This
resulted in two bricks in sync with each-other, with just
over 100.000 small (average 10 KiB) files, and one empty
brick. The healing process started populating the empty
brick at a really slow rate of something like two to five
files per minute.

I would have expected one of three things to become saturated
at least on one of the participating machines: or the network,
or disk I/O, or CPU. But far from it, nothing is even close
to saturated. On all three machines the CPUs (top) are running
almost idle, disk I/O (iotop) is negligible and network traffic
is in the order of 100 Kbps. It looks like 'nice -n 700 glusterd',
on a nice scale from 1 to 19.

Any ideas where I should look for the bottleneck? I can't find
anything even remotely relevant in any of the logs.

Z