[GEDI] Strange data corruption issue with gluster (libgfapi) and ZFS
Stefan Ring
stefanrin at gmail.com
Fri Feb 28 13:05:35 UTC 2020
On Fri, Feb 28, 2020 at 12:10 PM Kevin Wolf <kwolf at redhat.com> wrote:
> >
> > 8700
> > grows the file to a certain size (2134144 blocks)
> >
> > <8700 retires, nothing in flight>
> >
> > 8701
> > writes 55 blocks inside currently allocated file range, close to the
> > end (7 blocks short)
> >
> > 8702
> > writes 54 blocks from the end of 8701, growing the file by 47 blocks
> >
> > <8702 retires, 8701 remains in flight>
> >
> > 8703
> > writes from the end of 8702, growing the file by 81 blocks
> >
> > <8703 retires, 8701 remains in flight>
> >
> > 8704
> > writes 1623 blocks also from the end of 8702, growing the file by 1542 blocks
> >
> > <8701 retires>
> > <8704 retires>
> >
> > The exact range covered by 8703 ends up zeroed out.
> >
> > If 8701 retires earlier (before 8702 is issued), everything is fine.
>
> This sounds almost like two other bugs we got fixed recently (in the
> QEMU file-posix driver and in the XFS kernel driver) where two write
> extending the file size were in flight in parallel, but if the shorter
> one completed last, instead extending the file, it would end up
> truncating it.
>
> I'm not sure, though, why 8701 would try to change the file size because
> it's entirely inside the already allocated file range. But maybe adding
> the current file size at the start and completion of each request to
> your debug output could give us more data points?
Something I did not notice initially: Both 8700 and 8701 write to the
same starting offset. That does not change the fact that 8701 should
not change the size of the file.
More information about the integration
mailing list