[Bugs] [Bug 1802016] New: read() returns more than file size when using direct I/O
bugzilla at redhat.com
bugzilla at redhat.com
Wed Feb 12 07:46:59 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1802016
Bug ID: 1802016
Summary: read() returns more than file size when using direct
I/O
Product: Red Hat Gluster Storage
Version: rhhiv-1.6
Hardware: x86_64
OS: Linux
Status: NEW
Component: sharding
Severity: high
Priority: high
Assignee: kdhananj at redhat.com
Reporter: sasundar at redhat.com
QA Contact: sasundar at redhat.com
CC: atumball at redhat.com, bugs at gluster.org,
csaba at redhat.com, kdhananj at redhat.com,
khiremat at redhat.com, kwolf at redhat.com,
nsoffer at redhat.com, pkarampu at redhat.com,
rabhat at redhat.com, rgowdapp at redhat.com,
rhs-bugs at redhat.com, rkavunga at redhat.com,
sabose at redhat.com, sasundar at redhat.com,
storage-qa-internal at redhat.com, teigland at redhat.com,
tnisan at redhat.com, vjuranek at redhat.com
Depends On: 1802013
Target Milestone: ---
Classification: Red Hat
Description of problem:
When using direct I/O, reading from a file returns more data, padding the file
data with zeroes.
Here is an example.
## On a host mounting gluster using fuse
$ pwd
/rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/dom_md
$ mount | grep glusterfs
voodoo4.tlv.redhat.com:/gv0 on
/rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
$ stat metadata
File: metadata
Size: 501 Blocks: 1 IO Block: 131072 regular file
Device: 31h/49d Inode: 13313776956941938127 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 36/ vdsm) Gid: ( 36/ kvm)
Context: system_u:object_r:fusefs_t:s0
Access: 2019-08-01 22:21:49.186381528 +0300
Modify: 2019-08-01 22:21:49.427404135 +0300
Change: 2019-08-01 22:21:49.969739575 +0300
Birth: -
$ cat metadata
ALIGNMENT=1048576
BLOCK_SIZE=4096
CLASS=Data
DESCRIPTION=gv0
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
MASTER_VERSION=1
POOL_DESCRIPTION=4k-gluster
POOL_DOMAINS=de566475-5b67-4987-abf3-3dc98083b44c:Active
POOL_SPM_ID=-1
POOL_SPM_LVER=-1
POOL_UUID=44cfb532-3144-48bd-a08c-83065a5a1032
REMOTE_PATH=voodoo4.tlv.redhat.com:/gv0
ROLE=Master
SDUUID=de566475-5b67-4987-abf3-3dc98083b44c
TYPE=GLUSTERFS
VERSION=5
_SHA_CKSUM=3d1cb836f4c93679fc5a4e7218425afe473e3cfa
$ dd if=metadata bs=4096 count=1 of=/dev/null
0+1 records in
0+1 records out
501 bytes copied, 0.000340298 s, 1.5 MB/s
$ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00398529 s, 1.0 MB/s
Checking the copied data, the actual content of the file is padded
with zeros to 4096 bytes.
## On the one of the gluster nodes
$ pwd
/export/vdo0/brick/de566475-5b67-4987-abf3-3dc98083b44c/dom_md
$ stat metadata
File: metadata
Size: 501 Blocks: 16 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 149 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 36/ UNKNOWN) Gid: ( 36/ kvm)
Context: system_u:object_r:usr_t:s0
Access: 2019-08-01 22:21:50.380425478 +0300
Modify: 2019-08-01 22:21:49.427397589 +0300
Change: 2019-08-01 22:21:50.374425302 +0300
Birth: -
$ dd if=metadata bs=4096 count=1 of=/dev/null
0+1 records in
0+1 records out
501 bytes copied, 0.000991636 s, 505 kB/s
$ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct
0+1 records in
0+1 records out
501 bytes copied, 0.0011381 s, 440 kB/s
This proves that the issue is in gluster.
# gluster volume info gv0
Volume Name: gv0
Type: Replicate
Volume ID: cbc5a2ad-7246-42fc-a78f-70175fb7bf22
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: voodoo4.tlv.redhat.com:/export/vdo0/brick
Brick2: voodoo5.tlv.redhat.com:/export/vdo0/brick
Brick3: voodoo8.tlv.redhat.com:/export/vdo0/brick (arbiter)
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: disable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
$ xfs_info /export/vdo0
meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=6553600 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=26214400, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=12800, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Version-Release number of selected component (if applicable):
Server:
$ rpm -qa | grep glusterfs
glusterfs-libs-6.4-1.fc29.x86_64
glusterfs-api-6.4-1.fc29.x86_64
glusterfs-client-xlators-6.4-1.fc29.x86_64
glusterfs-fuse-6.4-1.fc29.x86_64
glusterfs-6.4-1.fc29.x86_64
glusterfs-cli-6.4-1.fc29.x86_64
glusterfs-server-6.4-1.fc29.x86_64
Client:
$ rpm -qa | grep glusterfs
glusterfs-client-xlators-6.4-1.fc29.x86_64
glusterfs-6.4-1.fc29.x86_64
glusterfs-rdma-6.4-1.fc29.x86_64
glusterfs-cli-6.4-1.fc29.x86_64
glusterfs-libs-6.4-1.fc29.x86_64
glusterfs-fuse-6.4-1.fc29.x86_64
glusterfs-api-6.4-1.fc29.x86_64
How reproducible:
Always.
Steps to Reproduce:
1. Provision gluster volume over vdo (did not check without vdo)
2. Create a file of 501 bytes
3. Read the file using direct I/O
Actual results:
read() returns 4096 bytes, padding the file data with zeroes
Expected results:
read() returns actual file data (501 bytes)
--- Additional comment from Nir Soffer on 2019-08-02 19:21:20 UTC ---
David, do you think this can affect sanlock?
--- Additional comment from Nir Soffer on 2019-08-02 19:25:02 UTC ---
Kevin, do you think this can affect qemu/qemu-img?
--- Additional comment from Amar Tumballi on 2019-08-05 05:33:57 UTC ---
@Nir, thanks for the report. We will look into this.
--- Additional comment from Kevin Wolf on 2019-08-05 09:16:16 UTC ---
(In reply to Nir Soffer from comment #2)
> Kevin, do you think this can affect qemu/qemu-img?
This is not a problem for QEMU as long as the file size is correct. If gluster
didn't do the zero padding, QEMU would do it internally.
In fact, fixing this in gluster may break the case of unaligned image sizes
with QEMU because the image size is rounded up to sector (512 byte) granularity
and the gluster driver turns short reads into errors. This would actually
affect non-O_DIRECT, too, which already seems to behave this way, so can you
just give this a quick test?
--- Additional comment from David Teigland on 2019-08-05 15:08:32 UTC ---
(In reply to Nir Soffer from comment #1)
> David, do you think this can affect sanlock?
I don't think so. sanlock doesn't use any space that it didn't first write to
initialize.
--- Additional comment from Worker Ant on 2019-08-08 05:56:04 UTC ---
REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size
when reads are sent beyond file size) posted (#1) for review on master by
Krutika Dhananjay
--- Additional comment from Worker Ant on 2019-08-12 13:30:56 UTC ---
REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size
when reads are sent beyond file size) merged (#3) on master by Krutika
Dhananjay
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1802013
[Bug 1802013] read() returns more than file size when using direct I/O
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Bugs
mailing list