[Bugs] [Bug 1751722] Gluster fuse mount crashed during truncate

Fri Sep 20 08:56:49 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1751722

--- Comment #13 from Krutika Dhananjay <kdhananj at redhat.com> ---
I'm able to recreate the bug now with some simple steps.

[root at dhcp35-215 ~]# gluster v info

Volume Name: rep
Type: Replicate
Volume ID: 8cad61f0-4770-4e75-b97c-7bab6cb0fa67
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: kdhananjay:/bricks/1
Brick2: kdhananjay:/bricks/2
Brick3: kdhananjay:/bricks/3
Options Reconfigured:
performance.strict-o-direct: on
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on

SET MD_CACHE_TIMEOUT TO A SLIGHTLY HIGH VALUE TO SIMULATE THE RACE:
# gluster volume set rep md-cache-timeout 60

CREATE TWO MOUNTS:
# mount -t glusterfs kdhananjay:/rep /mnt
# mount -t glusterfs kdhananjay:/rep /mnt1

FILL A FILE WITH DATA UPTO 512b
# dd if=/dev/urandom of=/mnt/__DIRECT_IO_TEST__ bs=512 count=1 oflag=direct

STAT THE FILE FROM FIRST MOUNT TO ENSURE 512b IS CACHED AS FILE SIZE IN-MEMORY,
NOW TRUNCATE THE FILE TO SIZE 0 FROM SECOND MOUNT; THEN TRUNCATE THE FILE TO 0
FROM THE FIRST MOUNT
# stat /mnt/__DIRECT_IO_TEST__; truncate -s 0 /mnt1/__DIRECT_IO_TEST__;
truncate -s 0 /mnt/__DIRECT_IO_TEST__

THE RESULT?
[root at dhcp35-215 ~]# getfattr -d -m . -e hex /bricks/1/__DIRECT_IO_TEST__       
getfattr: Removing leading '/' from absolute path names                         
# file: bricks/1/__DIRECT_IO_TEST__                                             
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a6574635f72756e74696d655f743a733000 
trusted.afr.dirty=0x000000000000000000000000                                    
trusted.gfid=0x3d79268080c241d9bf70039c7ce52c54                                 
trusted.gfid2path.69693d0e1876710b=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f5f5f4449524543545f494f5f544553545f5f
trusted.glusterfs.shard.block-size=0x0000000004000000
trusted.glusterfs.shard.file-size=0xfffffffffffffe00000000000000000000000000000000000000000000000000

^^ Size is now negative.

Here's what could have happened from VDSM's perspective in the actual rhhi
setup:

1. HOST 1 would have successfully performed 512b write (perhaps followed by a
lookup or a stat) and cached 512b as the file size.
2. Now HOST 2 would do the same open + write, wherein the open() part has
completed so far with O_TRUNC. SO file size in backend has become 0 due to
truncate.
3. Now HOST 1 would try the same open + write for the next block size, but size
has changed to 0 already unbeknownst to this mount. It does the truncate
(because O_TRUNC) and uses cached value to compute delta. So zero minus 512b
ends up being negative 512 in the backend.

Basically there is an interleaving of sub-operations of detect_block_size
across multiple mount points. By sub-operations i mean open + truncate + write

-- 
You are receiving this mail because:
You are on the CC list for the bug.