[Bugs] [Bug 1636631] New: Issuing a "heal ... full" on a disperse volume causes permanent high CPU utilization.

bugzilla at redhat.com bugzilla at redhat.com
Sat Oct 6 01:04:10 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1636631

            Bug ID: 1636631
           Summary: Issuing a "heal ... full" on a disperse volume causes
                    permanent high CPU utilization.
           Product: GlusterFS
           Version: 3.12
         Component: disperse
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: jbyers at stonefly.com
                CC: bugs at gluster.org



Issuing a "heal ... full" on a disperse volume causes permanent
high CPU utilization. 

This occurs even when the volume is completely empty. The CPU usage
is not due to healing I/O activity.

This only happens on disperse volumes, not on replica volumes. 

It happens in GlusterFS version 3.12.14, but does not happen
in version 3.7.18.

The high CPU utilization is by the 'glusterfs' SHD (self heal
daemon) process and is easily noticed using 'top'.

The 'glustershd.log' file shows that the disperse volume full
sweep keeps restarting and running forever:

[2018-10-06 00:56:11.245106] I [MSGID: 122059]
[ec-heald.c:415:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full
sweep on subvol disperse-vol-client-0
The message "I [MSGID: 122059] [ec-heald.c:406:ec_shd_full_healer]
0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-0"
repeated 2 times between [2018-10-06 00:56:11.243637] and [2018-10-06
00:56:11.246885]
[2018-10-06 00:56:11.247966] I [MSGID: 122059]
[ec-heald.c:415:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full
sweep on subvol disperse-vol-client-2
The message "I [MSGID: 122059] [ec-heald.c:406:ec_shd_full_healer]
0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-1"
repeated 3 times between [2018-10-06 00:56:11.239731] and [2018-10-06
00:56:11.248470]
[2018-10-06 00:56:11.248553] I [MSGID: 122059]
[ec-heald.c:406:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full
sweep on subvol disperse-vol-client-0
The message "I [MSGID: 122059] [ec-heald.c:406:ec_shd_full_healer]
0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-2"
repeated 3 times between [2018-10-06 00:56:11.242392] and [2018-10-06
00:56:11.251262]
[2018-10-06 00:56:11.251330] I [MSGID: 122059]
[ec-heald.c:406:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full
sweep on subvol disperse-vol-client-1
The message "I [MSGID: 122059] [ec-heald.c:415:ec_shd_full_healer]
0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-2"
repeated 2 times between [2018-10-06 00:56:11.247966] and [2018-10-06
00:56:11.253675]
[2018-10-06 00:56:11.253916] I [MSGID: 122059]
[ec-heald.c:406:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full
sweep on subvol disperse-vol-client-2
The message "I [MSGID: 122059] [ec-heald.c:406:ec_shd_full_healer]
0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-0"
repeated 5 times between [2018-10-06 00:56:11.248553] and [2018-10-06
00:56:11.256142]
[2018-10-06 00:56:11.256490] I [MSGID: 122059]
[ec-heald.c:415:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full
sweep on subvol disperse-vol-client-2
The message "I [MSGID: 122059] [ec-heald.c:415:ec_shd_full_healer]
0-disperse-vol-disperse-0: finished full sweep on subvol disperse-vol-client-0"
repeated 8 times between [2018-10-06 00:56:11.245106] and [2018-10-06
00:56:11.257386]
[2018-10-06 00:56:11.257585] I [MSGID: 122059]
[ec-heald.c:406:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full
sweep on subvol disperse-vol-client-0
[2018-10-06 00:56:11.258907] I [MSGID: 122059]
[ec-heald.c:415:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full
sweep on subvol disperse-vol-client-0
[2018-10-06 00:56:11.259098] I [MSGID: 122059]
[ec-heald.c:406:ec_shd_full_healer] 0-disperse-vol-disperse-0: starting full
sweep on subvol disperse-vol-client-0
The message "I [MSGID: 122059] [ec-heald.c:406:ec_shd_full_healer]
0-disperse-vol-disperse-0: starting full sweep on subvol disperse-vol-client-1"
repeated 3 times between [2018-10-06 00:56:11.251330] and [2018-10-06
00:56:11.259751]
[2018-10-06 00:56:11.261599] I [MSGID: 122059]
[ec-heald.c:415:ec_shd_full_healer] 0-disperse-vol-disperse-0: finished full
sweep on subvol disperse-vol-client-0

The only way to reduce the shd glusterfs process high CPU
utilization is to kill it, and restart it. It is then fine
until the next disperse volume "heal ... full".

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list