[Bugs] [Bug 1641344] New: Spurious failures in bug-1637802-arbiter-stale-data-heal-lock.t

Sun Oct 21 12:15:11 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1641344

            Bug ID: 1641344
           Summary: Spurious failures in
                    bug-1637802-arbiter-stale-data-heal-lock.t
           Product: GlusterFS
           Version: mainline
         Component: tests
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org


Problem:
    https://review.gluster.org/#/c/glusterfs/+/21427/ seems to be failing
    this .t spuriously. On checking one of the failure logs, I see:

    22:05:44 Launching heal operation to perform index self heal on volume
patchy has been unsuccessful:
    22:05:44 Self-heal daemon is not running. Check self-heal daemon log file.
    22:05:44 not ok 20 , LINENUM:38

    In glusterd log:
    [2018-10-18 22:05:44.298832] E [MSGID: 106301]
[glusterd-syncop.c:1352:gd_stage_op_phase] 0-management: Staging of operation
'Volume Heal' failed on localhost : Self-heal daemon is not running. Check
self-heal daemon log file

    But the tests which preceed this check whether via a statedump if the shd
is
    conected to the bricks, and they have succeeded and even started
    healing. From glustershd.log:

    [2018-10-18 22:05:40.975268] I [MSGID: 108026]
[afr-self-heal-common.c:1732:afr_log_selfheal] 0-patchy-replicate-0: Completed
data selfheal on 3b83d2dd-4cf2-4ea3-a33e-4275be40f440. sources=[0] 1  sinks=2

    So the only reason I can see launching heal via cli failing is a race where
    shd has been spawned but glusterd has not yet updated in-memory that it is
up,
    and hence failing the CLI.

    Fix:
    Check for shd up status before launching heal via CLI

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.