[Bugs] [Bug 1808688] New: Data corruption with asynchronous writes (please try to reproduce!)

Sat Feb 29 11:56:19 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1808688

            Bug ID: 1808688
           Summary: Data corruption with asynchronous writes (please try
                    to reproduce!)
           Product: GlusterFS
           Version: 7
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: libgfapi
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: stefanrin at gmail.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Created attachment 1666585
  --> https://bugzilla.redhat.com/attachment.cgi?id=1666585&action=edit
Reproducer

Description of problem:

I recently noticed data corruption with a ZFS-on-Linux VM running in qemu-kvm
with its storage in qcow2 on a gluster cluster. Since then I attempted
frantically to create a reproducer that does not involve running a guest
machine with ZoL and streaming gigabytes of data into it. I think I finally
succeeded. By now I just hope that noone can point out a bug in my reproducer
code leading to the corruption. For the original discussion, see
<https://lists.gluster.org/pipermail/integration/2020-February/000257.html>.

Version-Release number of selected component (if applicable):

I tested mostly with a Fedora 31 client. With the distro libgfapi as well as
git master. On the server side both CentOS 7 and Fedora 31 with the 7.3
releases. The production cluster where I originally witnessed this problem is
running rather older versions of everything. So my impression is that the same
thing happens with "any" version of the glusterfs code, basically.

I'm not sure about the libgfapi component. The problem might also be caused by
glusterd.

How reproducible:

Run real.c and check the resulting data file.

Actual results:

Verifier complains

Expected results:

Verifier does not complain

Additional info:

Information about the reproducer:

It writes a specific pattern into a data file repeatedly, spaced 2097152
(0x1000*512) bytes apart. Some of these always turn out wrong. Details about
the pattern here:
<https://lists.gluster.org/pipermail/integration/2020-February/000263.html>.

I compile on Fedora 31 like this:

$ gcc -O2 -g -pthread -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
-std=c11 real.c -lgfapi

The code might look a bit unwieldy and is not super clean, but should be rather
straightforward after a few words of explanation. I also mix all kinds of
integer types freely, but no huge quantities of anything are used, so
everything should be smooth. It is intended to employ two writer threads, so it
keeps two sets of data for the pwritev calls and marks them busy individually
after dispatching. It then waits for them to become available again (by the
completion function clearing the busy bits) in order to dispatch the next data
item. The get_worker function handles this waiting and worker set selection. It
can wait either for both of them to become idle (IDLE, nothing in flight), any
one of them being idle (ANY) or ANY with the additional restriction that a
specific sequence number must not be in flight (required for the 8704 request,
which overwrites data from 8703).

As this kind of asynchronous code is always a little tricky to write, I tested
it on a very simple fake aio interface in order to gain confidence (fake.c).
This version writes to a local file and always produces the correct output.

All the file, volume and host names are hard-coded, and the data file
("testfile0") needs to exist – it will be overwritten.

I also add a crude verifier (Python 2). For a correct file, it should just
output "256". From my last run, I get this output:

('bad', ([('\x04', 41472)], 31))
('bad', ([('\x04', 41472)], 68))
('bad', ([('\x04', 41472)], 91))
('bad', ([('\x04', 41472)], 92))
('bad', ([('\x04', 41472)], 93))
('bad', ([('\x04', 41472)], 94))
('bad', ([('\x04', 41472)], 103))
('bad', ([('\x04', 41472)], 118))
('bad', ([('\x04', 41472)], 151))
('bad', ([('\x04', 41472)], 169))
('bad', ([('\x04', 41472)], 175))
('bad', ([('\x04', 41472)], 207))
('bad', ([('\x04', 41472)], 214))
('bad', ([('\x04', 41472)], 228))
256

Which means that the 31st, 68th, and so on repetition of the pattern is wrong,
with a block of 41472 fours instead of fives.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.