[Bugs] [Bug 1724754] fallocate of a file larger than brick size leads to increased brick usage despite failure
bugzilla at redhat.com
bugzilla at redhat.com
Thu Jun 27 20:48:07 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1724754
--- Comment #3 from Raghavendra Bhat <rabhat at redhat.com> ---
I think the problem is with du -sh (or stat) on the fallocated file saying zero
usage on a glusterfs client.
1) Volume info
1x3 replicate volume
Volume Name: mirror
Type: Replicate
Volume ID: 68535a1f-48c3-4e7b-86fc-ecc0143c2cfe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: server1:/export1/tmp/mirror
Brick2: server2:/export1/tmp/mirror
Brick3: server3:/export1/tmp/mirror
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
2) Bricks
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 9.1M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/server1-root 50G 22G 29G 44% /
/dev/sda1 1014M 149M 866M 15% /boot
/dev/mapper/server1-root 500G 33M 500G 1% /home
tmpfs 1.6G 0 1.6G 0% /run/user/0
/dev/mapper/group-thin_vol 9.0G 34M 9.0G 1% /export1/tmp
=======> Used as brick for the volume
/dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp
i.e. /export1/tmp is used as brick in all the 3 nodes (same size as seen in
above df command)
3) mounted the client
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 9.1M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0%
/sys/fs/cgroup
/dev/mapper/server3-root 50G 22G 29G 44% /
/dev/mapper/server3-root 1.8T 33M 1.8T 1% /home
/dev/sda1 1014M 157M 858M 16% /boot
tmpfs 1.6G 0 1.6G 0%
/run/user/0
/dev/mapper/group-thin_vol 9.0G 34M 9.0G 1%
/export1/tmp
/dev/mapper/new-thin_vol 9.0G 33M 9.0G 1%
/export2/tmp
dell-per320-12.gsslab.rdu2.redhat.com:/mirror 9.0G 126M 8.9G 2%
/mnt/glusterfs ======> freshly mounted client
4) Ran the TEST
[root at server3 glusterfs]# fallocate -l 22GB repro
fallocate: fallocate failed: No space left on device
[root at server3 glusterfs]# du -sh repro
0 repro
============================================================================>
du -sh says 0 file size
[root at server3 glusterfs]# stat repro
File: ‘repro’
Size: 0 Blocks: 0 IO Block: 131072 regular empty file
=====> stat showing 0 size and 0 blocks
Device: 28h/40d Inode: 12956667450403493410 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: system_u:object_r:fusefs_t:s0
Access: 2019-06-19 15:29:25.712546158 -0400
Modify: 2019-06-19 15:29:25.712546158 -0400
Change: 2019-06-19 15:29:25.712546158 -0400
Birth: -
[root at server3 glusterfs]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 9.1M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0%
/sys/fs/cgroup
/dev/mapper/server3-root 50G 22G 29G 44% /
/dev/mapper/server3-home 1.8T 33M 1.8T 1% /home
/dev/sda1 1014M 157M 858M 16% /boot
tmpfs 1.6G 0 1.6G 0%
/run/user/0
/dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13%
/export1/tmp
/dev/mapper/new-thin_vol 9.0G 33M 9.0G 1%
/export2/tmp
server1:/mirror 9.0G 1.3G 7.8G 14%
/mnt/glusterfs =========> Increased consumption
5) Ran a similar test on a XFS filesystem (i.e. no glusterfs, only xfs
filesystem)
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 9.1M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/server1-root 50G 22G 29G 44% /
/dev/sda1 1014M 149M 866M 15% /boot
/dev/mapper/server1-home 500G 33M 500G 1% /home
tmpfs 1.6G 0 1.6G 0% /run/user/0
/dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13% /export1/tmp
/dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp
===========> a separate XFS filesytstem used in this test.
[root at server1 dir]# pwd
/export2/tmp/dir
[root at server1 dir]# fallocate -l 22GB repro
fallocate: fallocate failed: No space left on device
[root at server1 dir]# du -sh repro
1.2G repro
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 9.1M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/server1-root 50G 22G 29G 44% /
/dev/sda1 1014M 149M 866M 15% /boot
/dev/mapper/server1-home 500G 33M 500G 1% /home
tmpfs 1.6G 0 1.6G 0% /run/user/0
/dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13% /export1/tmp
/dev/mapper/new-thin_vol 9.0G 1.2G 7.9G 13% /export2/tmp
==================> Increased usage after the fallocate test
stat repro
File: ‘repro’
Size: 0 Blocks: 2359088 IO Block: 4096 regular empty file
======> zero size but non-zero blocks
Device: fd0ch/64780d Inode: 260 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2019-06-19 16:15:57.072885431 -0400
Modify: 2019-06-19 16:15:57.072885431 -0400
Change: 2019-06-19 16:15:57.072885431 -0400
Birth: -
CONCLUSION:
============
* So from the above tests, the xfs filesystem having a non-zero file is not the
problem IIUC. GlusterFS reporting du -sh <file> and the number of blocks as
zero
in the stat output is the problem.
* What happens is, as part of the operation (stat, du etc commands send a stat
() system call), the backend disk receives the request, does the on disk stat
() system call
and gives the response back to gluster brick process. THis is the stat response
received just after gluster brick does on disk stat () operation (got from gdb
attachment)
p lstatbuf
$31 = {st_dev = 64775, st_ino = 260, st_nlink = 2, st_mode = 33188, st_uid = 0,
st_gid = 0, __pad0 = 0, st_rdev = 0,
st_size = 0, st_blksize = 4096, st_blocks = 2358824, st_atim = {tv_sec =
1560972565, tv_nsec = 713592657},
st_mtim = {tv_sec = 1560972565, tv_nsec = 713592657}, st_ctim = {tv_sec =
1560972565, tv_nsec = 716592631},
__unused = {0, 0, 0}}
NOTE the non zero st_blocks received just after the response is received
* Gluster brick process now tries to converts the 'struct stat' structure
(where the stat information is present) to
its own internal 'struct iatt' structure and calls iatt_from_stat () function
* And in iatt_from_stat function, we handle the number of blocks information
differently for sparse files
iatt->ia_size = stat->st_size;
iatt->ia_blksize = stat->st_blksize;
iatt->ia_blocks = stat->st_blocks;
/* There is a possibility that the backend FS (like XFS) can
allocate blocks beyond EOF for better performance reasons, which
results in 'st_blocks' with higher values than what is consumed by
the file descriptor. This would break few logic inside GlusterFS,
like quota behavior etc, thus we need the exact number of blocks
which are consumed by the file to the higher layers inside GlusterFS.
Currently, this logic won't work for sparse files (ie, file with
holes)
*/
{
uint64_t maxblocks;
maxblocks = (iatt->ia_size + 511) / 512;
if (iatt->ia_blocks > maxblocks)
iatt->ia_blocks = maxblocks;
}
For the fallocated file, stat->st_size (hence iatt->ia_size) will be zero. So,
we
change the number of blocks (which in this case becomes zero).
* The same number of blocks information is used by du command to construct the
file size
Like mentioned in the 1st comment of this bug, one way to handle this would be
to ensure that in posix, if fallocate fails, it truncates the file to its last
known size.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list