[Bugs] [Bug 1166063] New: diff heal makes the system unusable

bugzilla at redhat.com bugzilla at redhat.com
Thu Nov 20 10:58:53 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1166063

            Bug ID: 1166063
           Summary: diff heal makes the system unusable
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
Here is the mail sent by Lindsay Mathieson:

2 Node replicate setup,

Everything has been stable for days untill I had occasion to reboot
one of the nodes. Since then (past hour) glusterfsd has been pegging
the CPU(s), utilization ranging from 1% to 1000% !

On average its around 500%

This is a vm server, so there are only 27 VM images for a total of
800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM

- What does glusterfsd do?

- What can I do to fix this?

thanks,
------------------------

We found that the root cause is that mount started self-heal of all the VMs
which are doing diff self-heal, i.e. checksums are consuming high CPU on the
bricks which lead to the issue. We need a way to throttle the number of
parallel self-heals.


Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Have a lot of VMs on replicated volume
2. Bring one brick down and do some write activity on all the VMs
3. Bring the brick back up while the VM operations are in progress
4. This will lead to self-heal of all the VMs by the mount.
5. That will cause high CPU usage on bricks because of checksums.

Expected results:
Bricks should not use so much CPU. There should be some kind of throttling

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list