[Bugs] [Bug 1608564] New: line coverage tests failing consistently over a week

Wed Jul 25 20:07:25 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1608564

            Bug ID: 1608564
           Summary: line coverage tests failing consistently over a week
           Product: GlusterFS
           Version: mainline
         Component: tests
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: srangana at redhat.com
                CC: bugs at gluster.org

The nightly line coverage tests are failing consistently for over a few weeks.
The failures are as follows,

2 test(s) failed 
./tests/basic/sdfs-sanity.t
./tests/bugs/core/bug-1432542-mpx-restart-crash.t

1 test(s) generated core 
./tests/basic/sdfs-sanity.t

a) ./tests/bugs/core/bug-1432542-mpx-restart-crash.t

This test is timing out, my thought is to increment the time for this test, as
the line coverage tests seem to take more time (assuming lcov instrumentation
slows things down).

For example the time taken for the following tests in centos7 regression builds
look as follows,
./tests/bugs/index/bug-1559004-EMLINK-handling.t  -  896 second
./tests/bugs/core/bug-1432542-mpx-restart-crash.t  -  309 second
./tests/basic/afr/lk-quorum.t  -  225 second

On lcov tests these take,
./tests/bugs/index/bug-1559004-EMLINK-handling.t  -  1063 second
./tests/bugs/core/bug-1432542-mpx-restart-crash.t  -  400 second (timeout)
./tests/basic/afr/lk-quorum.t  -  267 second

As can be seen each test seems to add 25 seconds for every 100 seconds of a
normal run.

Need to reproduce this locally and check if we can increase the timeout for the
mpx test to resolve (a)

b) ./tests/basic/sdfs-sanity.t

This test results in a core for glusterd, and as a result the test fails. The
core is common across runs and looks as follows,

See: https://build.gluster.org/job/line-coverage/391/consoleFull
(gdb) t 1
#0  0x00007f241d1f0c11 in __strnlen_sse2 () from ./lib64/libc.so.6

(gdb) f 2
#2  0x00007f241ec82d66 in xlator_volume_option_get_list
(vol_list=0x7f2404203570, key=0x7f241366fee0 "features") at options.c:933

(gdb) p opt[0]
$7 = {key = {0x7f240de27c6d "pass-through", 0x0, 0x0, 0x0}, type =
GF_OPTION_TYPE_BOOL, min = 0, max = 0, value = {0x0 <repeats 64 times>},
default_value = 0x7f240de27c7a "false", 
  description = 0x7f240de27c80 "Enable/Disable dentry serialize functionality",
validate = GF_OPT_VALIDATE_BOTH, op_version = {40100, 0, 0, 0}, deprecated =
{0, 0, 0, 0}, flags = 35, tags = {
    0x7f240de27cae "sdfs", 0x0 <repeats 63 times>}, setkey = 0x0, level =
OPT_STATUS_ADVANCED}

(gdb) p opt[1]
$8 = {key = {0x7f240e02a600 "", 0xc0a2c6690000007b <error: Cannot access memory
at address 0xc0a2c6690000007b>, 0x7cb34af9 <error: Cannot access memory at
address 0x7cb34af9>, 
    0x2 <error: Cannot access memory at address 0x2>}, type = 235060032, min =
0, max = 0, value = {0x0, 0x7f240e02a600 "", 0x392413ac0000007a <error: Cannot
access memory at address 0x392413ac0000007a>, ...

(gdb) p index
$11 = 1

(gdb) p cmp_key 
$9 = 0xc0a2c6690000007b <error: Cannot access memory at address
0xc0a2c6690000007b>

The above needs further debugging to get to the root cause of the failure for
(b).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.