[GEDI] Strange data corruption issue with gluster (libgfapi) and ZFS

Mon Feb 24 15:50:30 UTC 2020

On Mon, Feb 24, 2020 at 2:27 PM Kevin Wolf <kwolf at redhat.com> wrote:
> > > There are quite a few machines running on this host, and we have not
> > > experienced other problems so far. So right now, only ZFS is able to
> > > trigger this for some reason. The guest has 8 virtual cores. I also
> > > tried writing directly to the affected device from user space in
> > > patterns mimicking what I see in blktrace, but so far have been unable
> > > to trigger the same issue that way. Of the many ZFS knobs, I know at
> > > least one that makes a huge difference: When I set
> > > zfs_vdev_async_write_max_active to 1 (as opposed to 2 or 10), the
> > > error count goes through the roof (11.000).
>
> Wait, that does this setting actually do? Does it mean that QEMU should
> never sees more than a single active write request at the same time?
> So if this makes the error a lot more prominent, does this mean that
> async I/O actually makes the problem _less_ likely to occur?
>
> This sounds weird, so probably I'm misunderstanding the setting?

Yes, this is strange, and I will not follow this as I cannot reproduce
it on my home setup. Let’s just hope that it’s some kind of anomaly
that will go away once the real issue has been eliminated ;).

> > I can actually reproduce this on my Fedora 31 home machine with 3 VMs.
>
> This means QEMU 4.1.1, right?

Yes, qemu-system-x86-4.1.1-1.fc31.x86_64.

> > All 3 running CentOS 7.7. Two for glusterd, one for ZFS. Briefly, I
> > also got rid of the 2 glusterd VMs altogether, i.e. running glusterd
> > (the Fedora version) directly on the host, and it would still occur.
> > So my impression is that the server side of GlusterFS does not matter
> > much – I’ve seen it happen on 4.x, 6.x, 7.2 and 7.3. Also, as it
> > happens in the same way on a Fedora 31 qemu as well as a CentOS 7 one,
> > the qemu version is equally irrelevant.
> >
> > The main conclusion so far is that it has to do with growing the qcow2
> > image. With a fully pre-populated image, I cannot trigger it.
>
> Ok, that's a good data point.
>
> Is the corruption that you're seeing only in the guest data or is qcow2
> metadata also affected (does 'qemu-img check' report errors)?

"No errors were found on the image."

I don’t entirely rule out the possibility of qcow metadata corruption,
but at least it seems to be very unlikely compared to guest data
corruption.

> > What I plan to do next is look at the block ranges being written in
> > the hope of finding overlaps there.
>
> Either that, or other interesting patterns.
>
> Did you try to remove the guest from the equation? If you say that the
> problem is with multiple parallel requests, maybe 'qemu-img bench' can
> cause the same kind of corruption? (Though if it's only corruption in
> the content rather than qcow2 metadata, it may be hard to detect.
> Giving qemu-io an explicit list of requests could still be an option
> once we have a suspicion what pattern creates the problem.)

Did not know about qemu-img bench, but narrowing it down as much as
possible – and that entails removing the guest VM – is my number one
priority here.