[Bugs] [Bug 1801623] New: spurious self-heald.t failure

Tue Feb 11 11:37:47 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1801623

            Bug ID: 1801623
           Summary: spurious self-heald.t failure
           Product: GlusterFS
           Version: mainline
            Status: NEW
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

20:10:58 ./tests/basic/afr/self-heald.t .. 
20:10:58 1..84
20:10:58 ok   1 [    160/   1061] <  46> 'glusterd'
20:10:58 ok   2 [      9/     17] <  47> 'pidof glusterd'
20:10:58 ok   3 [      9/     97] <  48> 'gluster --mode=script --wignore
volume create patchy replica 2
builder203.int.aws.gluster.org:/d/backends/patchy0
builder203.int.aws.gluster.org:/d/backends/patchy1
builder203.int.aws.gluster.org:/d/backends/patchy2
builder203.int.aws.gluster.org:/d/backends/patchy3
builder203.int.aws.gluster.org:/d/backends/patchy4
builder203.int.aws.gluster.org:/d/backends/patchy5'
20:10:58 ok   4 [     12/    147] <  49> 'gluster --mode=script --wignore
volume set patchy cluster.background-self-heal-count 0'
20:10:58 ok   5 [     12/    154] <  50> 'gluster --mode=script --wignore
volume set patchy cluster.eager-lock off'
20:10:58 ok   6 [     15/    159] <  51> 'gluster --mode=script --wignore
volume set patchy performance.flush-behind off'
20:10:58 ok   7 [     14/   1407] <  52> 'gluster --mode=script --wignore
volume start patchy'
20:10:58 ok   8 [     10/    217] <  53> '_GFS --attribute-timeout=0
--entry-timeout=0 --volfile-id=/patchy
--volfile-server=builder203.int.aws.gluster.org /mnt/glusterfs/0'
20:10:58 [2020-02-10 14:40:07.086270] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
/build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x1bc)[0x7fb36875b6ee]
(--> /build/install/lib/libgfrpc.so.0(+0x1176f)[0x7fb3680c076f] (-->
/build/install/lib/libgfrpc.so.0(+0x11856)[0x7fb3680c0856] (-->
/build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x16c)[0x7fb3680c0de4]
(--> /build/install/lib/libgfrpc.so.0(+0x12731)[0x7fb3680c1731] )))))
0-gf-attach-rpc: forced unwinding frame type(brick operations) op(--(1)) called
at 2020-02-10 14:40:07.077738 (xid=0x2)
20:10:58 got error -1 on RPC
20:10:58 ok   9 [     14/   2096] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  10 [     15/   2101] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 [2020-02-10 14:40:11.315789] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
/build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x1bc)[0x7f75e7b8a6ee]
(--> /build/install/lib/libgfrpc.so.0(+0x1176f)[0x7f75e74ef76f] (-->
/build/install/lib/libgfrpc.so.0(+0x11856)[0x7f75e74ef856] (-->
/build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x16c)[0x7f75e74efde4]
(--> /build/install/lib/libgfrpc.so.0(+0x12731)[0x7f75e74f0731] )))))
0-gf-attach-rpc: forced unwinding frame type(brick operations) op(--(1)) called
at 2020-02-10 14:40:11.302544 (xid=0x2)
20:10:58 got error -1 on RPC
20:10:58 ok  11 [     11/   2103] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  12 [   2386/    194] <  72> '43 get_pending_heal_count patchy'
20:10:58 ok  13 [     16/    183] <  75> '3 disconnected_brick_count patchy'
20:10:58 ok  14 [    112/    175] <  82> '53 get_pending_heal_count patchy'
20:10:58 ok  15 [     81/    208] <  87> '43 get_pending_heal_count patchy'
20:10:58 ok  16 [     16/     77] <  90> '! gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  17 [     11/    155] <  91> 'gluster --mode=script --wignore
volume set patchy cluster.self-heal-daemon off'
20:10:58 ok  18 [     11/     71] <  92> '! gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  19 [     24/     72] <  93> '! gluster --mode=script --wignore
volume heal patchy full'
20:10:58 ok  20 [     11/    163] <  94> 'gluster --mode=script --wignore
volume start patchy force'
20:10:58 ok  21 [     14/    158] <  95> 'gluster --mode=script --wignore
volume set patchy cluster.self-heal-daemon on'
20:10:58 ok  22 [     10/     79] <  96> 'Y glustershd_up_status'
20:10:58 ok  23 [     10/   3342] <  34> '1 afr_child_up_status_in_shd patchy
1'
20:10:58 ok  24 [     11/    329] <  34> '1 afr_child_up_status_in_shd patchy
3'
20:10:58 ok  25 [     10/     93] <  34> '1 afr_child_up_status_in_shd patchy
5'
20:10:58 ok  26 [     10/    268] < 100> 'gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  27 [   5164/      1] < 103> '[ 43 -gt 0 ]'
20:10:58 ok  28 [      9/     57] < 105> 'gluster --mode=script --wignore
volume heal patchy full'
20:10:58 ok  29 [     68/    154] < 106> '^0$ get_pending_heal_count patchy'
20:10:58 not ok  30 [     25/    511] < 119> '0 get_pending_heal_count patchy'
-> 'Got "1" instead of "0"'
20:10:58 ok  31 [     20/      5] < 129> 'touch /mnt/glusterfs/0/f'
20:10:58 ok  32 [      8/     14] < 130> 'mkdir /mnt/glusterfs/0/d'
20:10:58 ok  33 [      9/    113] < 132> 'gluster --mode=script --wignore
volume set patchy cluster.data-self-heal off'
20:10:58 ok  34 [     15/     54] < 133> 'off volume_option patchy
cluster.data-self-heal'
20:10:58 ok  35 [      9/   1069] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  36 [      9/   1072] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  37 [      9/   1083] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  38 [     18/    116] < 136> '1 get_pending_heal_count patchy'
20:10:58 ok  39 [      9/    146] < 137> 'gluster --mode=script --wignore
volume start patchy force'
20:10:58 ok  40 [     16/     62] < 138> 'Y glustershd_up_status'
20:10:58 ok  41 [      9/   2572] <  34> '1 afr_child_up_status_in_shd patchy
1'
20:10:58 ok  42 [      9/     62] <  34> '1 afr_child_up_status_in_shd patchy
3'
20:10:58 ok  43 [      8/     62] <  34> '1 afr_child_up_status_in_shd patchy
5'
20:10:58 ok  44 [      9/     63] < 141> 'gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  45 [      9/    119] < 142> '^0$ get_pending_heal_count patchy'
20:10:58 ok  46 [      9/    109] < 143> 'gluster --mode=script --wignore
volume set patchy cluster.data-self-heal on'
20:10:58 ok  47 [     18/    114] < 146> 'gluster --mode=script --wignore
volume set patchy cluster.metadata-self-heal off'
20:10:58 ok  48 [     10/     53] < 147> 'off volume_option patchy
cluster.metadata-self-heal'
20:10:58 ok  49 [      9/   1068] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  50 [      9/   1072] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  51 [      9/   1076] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  52 [     10/      8] < 150> 'chmod 777 /mnt/glusterfs/0/f'
20:10:58 ok  53 [      9/    108] < 151> '1 get_pending_heal_count patchy'
20:10:58 ok  54 [      9/    153] < 152> 'gluster --mode=script --wignore
volume start patchy force'
20:10:58 ok  55 [     13/     63] < 153> 'Y glustershd_up_status'
20:10:58 ok  56 [      9/   2571] <  34> '1 afr_child_up_status_in_shd patchy
1'
20:10:58 ok  57 [      9/     64] <  34> '1 afr_child_up_status_in_shd patchy
3'
20:10:58 ok  58 [      9/     62] <  34> '1 afr_child_up_status_in_shd patchy
5'
20:10:58 ok  59 [      8/     62] < 156> 'gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  60 [     10/    121] < 157> '^0$ get_pending_heal_count patchy'
20:10:58 ok  61 [      9/    116] < 158> 'gluster --mode=script --wignore
volume set patchy cluster.metadata-self-heal on'
20:10:58 ok  62 [     12/    111] < 161> 'gluster --mode=script --wignore
volume set patchy cluster.entry-self-heal off'
20:10:58 ok  63 [     11/     55] < 162> 'off volume_option patchy
cluster.entry-self-heal'
20:10:58 ok  64 [      9/   1067] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  65 [      9/   1072] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  66 [      9/   1073] <  21> 'kill_brick patchy
builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  67 [     10/     10] < 164> 'touch /mnt/glusterfs/0/d/a'
20:10:58 ok  68 [    129/      1] < 168> 'test 2 -eq 2 -o 2 -eq 4'
20:10:58 ok  69 [      9/    124] < 169> 'gluster --mode=script --wignore
volume start patchy force'
20:10:58 ok  70 [     11/     64] < 170> 'Y glustershd_up_status'
20:10:58 ok  71 [      9/   2580] <  34> '1 afr_child_up_status_in_shd patchy
1'
20:10:58 ok  72 [      9/     64] <  34> '1 afr_child_up_status_in_shd patchy
3'
20:10:58 ok  73 [      9/     73] <  34> '1 afr_child_up_status_in_shd patchy
5'
20:10:58 ok  74 [      9/     60] < 172> 'gluster --mode=script --wignore
volume heal patchy'
20:10:58 ok  75 [     11/    120] < 173> '^0$ get_pending_heal_count patchy'
20:10:58 ok  76 [      9/    116] < 174> 'gluster --mode=script --wignore
volume set patchy cluster.entry-self-heal on'
20:10:58 ok  77 [     11/     57] < 178> '! gluster --mode=script --wignore
volume heal fail info'
20:10:58 ok  78 [      9/   7088] < 181> 'gluster --mode=script --wignore
volume stop patchy'
20:10:58 ok  79 [      9/     83] < 182> '! gluster --mode=script --wignore
volume heal patchy info'
20:10:58 ok  80 [      9/   3089] < 185> 'gluster --mode=script --wignore
volume delete patchy'
20:10:58 ok  81 [     12/    112] < 186> 'gluster --mode=script --wignore
volume create patchy builder203.int.aws.gluster.org:/d/backends/patchy{6}'
20:10:58 ok  82 [     11/    110] < 187> 'gluster --mode=script --wignore
volume start patchy'
20:10:58 ok  83 [     11/     95] < 188> '! gluster --mode=script --wignore
volume heal patchy info'
20:10:58 ok  84 [     11/    202] < 191> '! log_newer 1581345602 offset reused
from another DIR'
20:10:58 Failed 1/84 subtests 

Problem:
heal-info code assumes that all indices in xattrop directory 
definitely need heal. There is one corner case.
The very first xattrop on the file will lead to adding the
gfid to 'xattrop' index in fop path and in _cbk path it is
removed because the fop is zero-xattr xattrop in success case.
These gfids could be read by heal-info and shown as needing heal.

Fix:
Check the pending flag to see if the file definitely needs or
not instead of which index is being crawled at the moment.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.