[Gluster-users] Unreasonably high load after rebooting a brick
Ray Gibson
booray at gmail.com
Mon Jul 13 19:03:22 UTC 2015
Hello,
I'm testing an environment on AWS right now and running into a strange
issue. In summary, my setup is like this:
2 x c4.large (2 visible CPU, 4gb ram) running a 500gb (magnetic
backend) Replicate gluster volume. So, each instance has 100% of the
data. Running 3.7.2 on servers and multiple clients. Bricks are xfs
with noatime mount flag. Server and client threads set at 4 right now
(was 2 before, same issue)
Currently there are ~800,000 smaller files (jpeg, 300k to 3 meg) on
the volume, and one of the clients is writing new files to the volume
constantly, on average about 2-3 per second.
100% of the time, there is practically no load. I could run these on
micro instances.. but, if I happen to reboot one of them, I run into
some serious trouble. Both boxes max out on cpu, load average goes
into the 4-6 range, and my client can no longer write to the volume.
About 18 minutes later, there are finally log entries added to the
glustershd.log file and it begins a self heal on added files. The
load calms down about 5-10 minutes after that, and other clients can
do reads and writes again. However, my original client that was
trying to write the small files is ultimately stuck, I can't even do
an ls on a folder without it taking 30+ seconds. Ultimately if I kill
off everything that was trying to write, unmount and remount the
volume, I can get it functional again.
Do I just have too many small files? Would this not happen with gp2
(ssd) bricks? Is there a way to throttle whatever ate up all the cpu
so that services can continue with the fully functional brick?
I appreciate any insight. Thank you for your bandwidth.
Ray
More information about the Gluster-users
mailing list