[Bugs] [Bug 1751722] Gluster fuse mount crashed during truncate

bugzilla at redhat.com bugzilla at redhat.com
Wed Sep 18 15:55:59 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1751722



--- Comment #5 from Krutika Dhananjay <kdhananj at redhat.com> ---
RCA:
The size going negative is when two consecutive truncates on the
__DIRECT_IO_TEST__ file (coming from open with O_TRUNC) happened in the
following sequence:

1. Size of the file at the beginning - 512b
2. First truncate on a given mount truncated the file to size 0. Delta size =
final size - initial size = 0 - 512 = -512.
3. Xattrop is now sent with -512. And file size had been 512. So 512 + (-512) =
0. Final on-disk size at the end of this truncate is 0.
But shard translator in the truncate fop callback continues to cache 512 as the
file size.
4. Then a second truncate (again to size 0) is sent, without a lookup or stat
preceding it. So the size in cache is believed to be true.
Delta size = final size - initial size = 0 - 512 = -512. (here initial size
should have been 0 but it is wrongly assumed to be 512).
So an xattrop is sent with -512. So 0 - 512 = 0xfffffffffffffe00

And this is what we see in the getfattr output of the file:

[root at rhsqa-grafton8 ~]# getfattr -d -m . -e hex
/gluster_bricks/data/data/__DIRECT_IO_TEST__ 
getfattr: Removing leading '/' from absolute path names
# file: gluster_bricks/data/data/__DIRECT_IO_TEST__
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xe8235ee2ea374f0a9e54853db2781f93
trusted.gfid2path.69693d0e1876710b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f5f5f4449524543545f494f5f544553545f5f
trusted.glusterfs.shard.block-size=0x0000000004000000
trusted.glusterfs.shard.file-size=0xfffffffffffffe00000000000000000000000000000000010000000000000000

More evidence in wireshark output. Will paste it here tomorrow for the sake of
completeness (my laptop's freezing at some point when i try to load the pcap
file which is huge in wireshark)

NOTE:
+++++

Part of the reason why we hit this crash now is because truncate codepath in
shard had largely remained untested. With the newly introduced block-size
detection code in rhv/ovirt, which primarily opens the file __DIRECT_IO_TEST__
with O_TRUNC and performs io on it, the truncate fop path is now getting
exercised frequently.

-Krutika

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list