[Bugs] [Bug 1724754] fallocate of a file larger than brick size leads to increased brick usage despite failure

Thu Jun 27 20:48:07 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1724754

--- Comment #3 from Raghavendra Bhat <rabhat at redhat.com> ---

I think the problem is with du -sh (or stat) on the fallocated file saying zero
usage on a glusterfs client.

1) Volume info

1x3 replicate volume

Volume Name: mirror
Type: Replicate
Volume ID: 68535a1f-48c3-4e7b-86fc-ecc0143c2cfe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: server1:/export1/tmp/mirror
Brick2: server2:/export1/tmp/mirror
Brick3: server3:/export1/tmp/mirror
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

2) Bricks

 df -h
Filesystem                              Size  Used Avail Use% Mounted on
devtmpfs                                7.8G     0  7.8G   0% /dev
tmpfs                                   7.8G     0  7.8G   0% /dev/shm
tmpfs                                   7.8G  9.1M  7.8G   1% /run
tmpfs                                   7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/server1-root                 50G   22G   29G  44% /
/dev/sda1                              1014M  149M  866M  15% /boot
/dev/mapper/server1-root                500G   33M  500G   1% /home
tmpfs                                   1.6G     0  1.6G   0% /run/user/0
/dev/mapper/group-thin_vol              9.0G   34M  9.0G   1% /export1/tmp
=======> Used as brick for the volume 
/dev/mapper/new-thin_vol                9.0G   33M  9.0G   1% /export2/tmp

i.e. /export1/tmp is used as brick in all the 3 nodes (same size as seen in
above df command)

3) mounted the client

df -h
Filesystem                                     Size  Used Avail Use% Mounted on
devtmpfs                                       7.8G     0  7.8G   0% /dev
tmpfs                                          7.8G     0  7.8G   0% /dev/shm
tmpfs                                          7.8G  9.1M  7.8G   1% /run
tmpfs                                          7.8G     0  7.8G   0%
/sys/fs/cgroup
/dev/mapper/server3-root                       50G   22G   29G  44% /
/dev/mapper/server3-root                       1.8T   33M  1.8T   1% /home
/dev/sda1                                     1014M  157M  858M  16% /boot
tmpfs                                          1.6G     0  1.6G   0%
/run/user/0
/dev/mapper/group-thin_vol                     9.0G   34M  9.0G   1%
/export1/tmp
/dev/mapper/new-thin_vol                       9.0G   33M  9.0G   1%
/export2/tmp
dell-per320-12.gsslab.rdu2.redhat.com:/mirror  9.0G  126M  8.9G   2%
/mnt/glusterfs ======> freshly mounted client

4) Ran the TEST

[root at server3 glusterfs]# fallocate -l 22GB repro
fallocate: fallocate failed: No space left on device
[root at server3 glusterfs]# du -sh repro
0       repro    
============================================================================>
du -sh says 0 file size

[root at server3 glusterfs]# stat repro
  File: ‘repro’
  Size: 0               Blocks: 0          IO Block: 131072 regular empty file 
 =====> stat showing 0 size and 0 blocks
Device: 28h/40d Inode: 12956667450403493410  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fusefs_t:s0
Access: 2019-06-19 15:29:25.712546158 -0400
Modify: 2019-06-19 15:29:25.712546158 -0400
Change: 2019-06-19 15:29:25.712546158 -0400
 Birth: -

[root at server3 glusterfs]# df -h
Filesystem                                     Size  Used Avail Use% Mounted on
devtmpfs                                       7.8G     0  7.8G   0% /dev
tmpfs                                          7.8G     0  7.8G   0% /dev/shm
tmpfs                                          7.8G  9.1M  7.8G   1% /run
tmpfs                                          7.8G     0  7.8G   0%
/sys/fs/cgroup
/dev/mapper/server3-root                       50G   22G   29G  44% /
/dev/mapper/server3-home                       1.8T   33M  1.8T   1% /home
/dev/sda1                                     1014M  157M  858M  16% /boot
tmpfs                                          1.6G     0  1.6G   0%
/run/user/0
/dev/mapper/group-thin_vol                     9.0G  1.2G  7.9G  13%
/export1/tmp
/dev/mapper/new-thin_vol                       9.0G   33M  9.0G   1%
/export2/tmp
server1:/mirror                                9.0G  1.3G  7.8G  14%
/mnt/glusterfs =========> Increased consumption 

5) Ran a similar test on a XFS filesystem (i.e. no glusterfs, only xfs
filesystem)

df -h
Filesystem                              Size  Used Avail Use% Mounted on
devtmpfs                                7.8G     0  7.8G   0% /dev
tmpfs                                   7.8G     0  7.8G   0% /dev/shm
tmpfs                                   7.8G  9.1M  7.8G   1% /run
tmpfs                                   7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/server1-root                50G   22G   29G  44% /
/dev/sda1                              1014M  149M  866M  15% /boot
/dev/mapper/server1-home                500G   33M  500G   1% /home
tmpfs                                   1.6G     0  1.6G   0% /run/user/0
/dev/mapper/group-thin_vol              9.0G  1.2G  7.9G  13% /export1/tmp
/dev/mapper/new-thin_vol                9.0G   33M  9.0G   1% /export2/tmp  
===========> a separate XFS filesytstem used in this test.

[root at server1 dir]# pwd
/export2/tmp/dir

[root at server1 dir]# fallocate -l 22GB repro
fallocate: fallocate failed: No space left on device
[root at server1 dir]# du -sh  repro
1.2G    repro

df -h
Filesystem                              Size  Used Avail Use% Mounted on
devtmpfs                                7.8G     0  7.8G   0% /dev
tmpfs                                   7.8G     0  7.8G   0% /dev/shm
tmpfs                                   7.8G  9.1M  7.8G   1% /run
tmpfs                                   7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/server1-root                50G   22G   29G  44% /
/dev/sda1                              1014M  149M  866M  15% /boot
/dev/mapper/server1-home                500G   33M  500G   1% /home
tmpfs                                   1.6G     0  1.6G   0% /run/user/0
/dev/mapper/group-thin_vol              9.0G  1.2G  7.9G  13% /export1/tmp
/dev/mapper/new-thin_vol                9.0G  1.2G  7.9G  13% /export2/tmp
==================> Increased usage after the fallocate test

stat repro
  File: ‘repro’
  Size: 0               Blocks: 2359088    IO Block: 4096   regular empty file
======> zero size but non-zero blocks
Device: fd0ch/64780d    Inode: 260         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2019-06-19 16:15:57.072885431 -0400
Modify: 2019-06-19 16:15:57.072885431 -0400
Change: 2019-06-19 16:15:57.072885431 -0400
 Birth: -

CONCLUSION:
============

* So from the above tests, the xfs filesystem having a non-zero file is not the
problem IIUC. GlusterFS reporting du -sh <file> and the number of blocks as
zero
in the stat output is the problem.

* What happens is, as part of the operation (stat, du etc commands send a stat
() system call), the backend disk receives the request, does the on disk stat
() system call
and gives the response back to gluster brick process. THis is the stat response
received just after gluster brick does on disk stat () operation (got from gdb
attachment)

 p lstatbuf
$31 = {st_dev = 64775, st_ino = 260, st_nlink = 2, st_mode = 33188, st_uid = 0,
st_gid = 0, __pad0 = 0, st_rdev = 0,
  st_size = 0, st_blksize = 4096, st_blocks = 2358824, st_atim = {tv_sec =
1560972565, tv_nsec = 713592657},
  st_mtim = {tv_sec = 1560972565, tv_nsec = 713592657}, st_ctim = {tv_sec =
1560972565, tv_nsec = 716592631},
  __unused = {0, 0, 0}}

NOTE the non zero st_blocks received just after the response is received

* Gluster brick process now tries to converts the  'struct stat' structure
(where the stat information is present) to
  its own internal 'struct iatt' structure and calls iatt_from_stat () function

* And in iatt_from_stat function, we handle the number of blocks information
differently for sparse files

     iatt->ia_size = stat->st_size;
    iatt->ia_blksize = stat->st_blksize;
    iatt->ia_blocks = stat->st_blocks;

    /* There is a possibility that the backend FS (like XFS) can                
       allocate blocks beyond EOF for better performance reasons, which         
       results in 'st_blocks' with higher values than what is consumed by       
       the file descriptor. This would break few logic inside GlusterFS,        
       like quota behavior etc, thus we need the exact number of blocks         
       which are consumed by the file to the higher layers inside GlusterFS.    
       Currently, this logic won't work for sparse files (ie, file with         
       holes)                                                                   
    */
    {
        uint64_t maxblocks;

        maxblocks = (iatt->ia_size + 511) / 512;

        if (iatt->ia_blocks > maxblocks)
            iatt->ia_blocks = maxblocks;
    }

 For the fallocated file, stat->st_size (hence iatt->ia_size) will be zero. So,
we 
 change the number of blocks (which in this case becomes zero).

* The same number of blocks information is used by du command to construct the
file size

Like mentioned in the 1st comment of this bug, one way to handle this would be
to ensure that in posix, if fallocate fails, it truncates the file to its last
known size.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.