[Bugs] [Bug 1427419] Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user.

Tue Feb 28 07:55:23 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1427419

--- Comment #2 from Ashish Pandey <aspandey at redhat.com> ---
Description of problem:
=======================
When any of the EC volume bricks goes down and comes up when IO was happening,
getting the below warning messages in self heal daemon log (shd log), end user
can't understand problem is with which sub volumes, we are printing the hex
decimal values for subvolumes, enduser has to do lot of maths to know the sub
volumes.

We have to improve these warning messages for end user to understand.

[2016-12-23 04:52:00.658995] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:00.659085] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:00.812666] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:00.812709] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.053575] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:01.053651] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.059907] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:01.059983] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.085491] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on
some subvolumes

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-9.el6rhs.x86_64.

How reproducible:
=================
Always

Steps to Reproduce
===================
1. Have basic recommended EC volume setup.
2. Fuse mount the volume.
3. Make one brick down and start IO in the mount point.
4. after some time of IO happens, brick up the offline brick using volume start
force.
5. Check the self heal daemon logs for above mentioned warning messages.

Actual results:
===============
Warning messages throwing when EC volume offline brick comes up are difficult
to understand for end user.

Expected results:
=================
Improve the warning messages throwing when EC volume offline brick comes up to
make end user to understand.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=J2jq3IKqSV&a=cc_unsubscribe