[Bugs] [Bug 1417050] [Stress] : SHD Logs flooded with "Heal Failed" messages, filling up "/" quickly
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jan 27 05:56:14 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1417050
--- Comment #1 from Ashish Pandey <aspandey at redhat.com> ---
Description of problem:
-----------------------
4 Node cluster,1 EC volume exported via Ganesha.
Replaced 4 bricks in different subvols,which triggered a heal.
Data was being populated from multiple Ganesha mounts while heal happened.
The shd logs and Ganesha-gfapi logs were literally flooded with Heal failures :
The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-4: Heal failed [Invalid argument]" repeated 4 times between
[2017-01-22 09:58:51.836831] and [2017-01-22 09:58:52.109527]
[2017-01-22 09:58:52.225738] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-4: Heal failed [Invalid argument]
[2017-01-22 09:58:52.228903] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-4: Heal failed [Invalid argument]
The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]" repeated 8 times between
[2017-01-22 09:58:52.066587] and [2017-01-22 09:58:52.328220]
[2017-01-22 09:58:52.340050] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]
[2017-01-22 09:58:52.379059] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-6: Heal failed [Invalid argument]
The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]" repeated 4 times between
[2017-01-22 09:58:52.340050] and [2017-01-22 09:58:52.453352]
[2017-01-22 09:58:52.461780] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-6: Heal failed [Invalid argument]
[2017-01-22 09:58:52.495767] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]
[2017-01-22 09:58:52.544095] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]
[2017-01-22 09:58:52.584513] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-testvol-disperse-0: Heal failed [Invalid argument]
The log message was populated nearly 1000000 times per node in the last 24
hours :
[root at gqas015 ~]# cat /var/log/ganesha-gfapi.log |grep -i "Heal fail"|wc -l
1433241
[root at gqas009 ~]# cat /var/log/glusterfs/glustershd.log|grep -i "Heal fail"|wc
-l
1075672
[root at gqas009 ~]#
It's filling up / quickly :
[root at gqas015 ~]# ll -h /var/log/glusterfs/glustershd.log
-rw------- 1 root root 403M Jan 23 01:57 /var/log/glusterfs/glustershd.log
[root at gqas015 ~]#
[root at gqas015 ~]# ll -h /var/log/ganesha-gfapi.log
-rw------- 1 root root 538M Jan 23 00:59 /var/log/ganesha-gfapi.log
[root at gqas015 ~]#
*********************************
EXACT WORKLOAD on Ganesha mounts
*********************************
Smallfile Appends,ll -R,tarball untar
Version-Release number of selected component (if applicable):
-------------------------------------------------------------
nfs-ganesha-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64
How reproducible:
-----------------
Reporting the first occurence.
Actual results:
---------------
Logs spammed with "Heal Failed" errors
Expected results:
-----------------
No Log spamming.
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=I4L89ygcNs&a=cc_unsubscribe
More information about the Bugs
mailing list