[Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file

bugzilla at redhat.com bugzilla at redhat.com
Thu Dec 5 19:11:10 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1732961



--- Comment #30 from Olaf Buitelaar <olaf.buitelaar at gmail.com> ---
Created attachment 1642453
  --> https://bugzilla.redhat.com/attachment.cgi?id=1642453&action=edit
different stale file

Today i found a completely different scenario on which stale files could occur.
While i was investigating why the docker image of gitlab
(https://hub.docker.com/r/gitlab/gitlab-ce) couldn't start on a regular mount.
It just fails on the postgresql initialisation. I've attempted this several
times before all the way back on gluster 3.12.15 (the latest tests were
performed on gluster 6.6). I decided to give it different route and create a
new volume with the options;
network.remote-dio off
performance.strict-o-direct on
cluster.choose-local off
interestingly enough this allowed the postgresql initialisation to complete,
and the container to fully start-up and being functional. To test further i
also wanted to see if it would work with sharding, since none my other docker
containers work with a gluster volume which is shared.So i created another
volume, with sharding, and this also worked well. Except that both volumes
report a (the same) stale file
on;/gitlab/data/postgresql/data/pg_stat_tmp/global.stat

([fuse-bridge.c:1509:fuse_fd_cbk] 0-glusterfs-fuse: 13945691: OPEN()
/gitlab/data/postgresql/data/pg_stat_tmp/global.stat => -1 (Stale file handle))

i couldn't directly find anything in the logs, but i've attached it all,
including the docker stack file used to deploy gitlab (which is one of the most
bloated containers i've seen, running many services inside)

For the record;
============================
on this volume gitlab wouldn't start, since it wouldn't go passed the
postgresql initialisation;

gluster v info docker2

Volume Name: docker2
Type: Distributed-Replicate
Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
Options Reconfigured:
performance.cache-size: 128MB
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: on
============================  
commands used to create none-sharded volume;

gluster volume create docker2-dio replica 3 arbiter 1
10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio
10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio
10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio
gluster volume set docker2-dio network.remote-dio off
gluster volume set docker2-dio performance.strict-o-direct on
gluster volume set docker2-dio cluster.choose-local off
gluster v start docker2-dio

gluster v info docker2-dio

Volume Name: docker2-dio
Type: Replicate
Volume ID: 54b3a0dc-20a9-4d29-a7ea-bd7cd8500b91
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio
Brick2: 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio
Brick3: 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio (arbiter)
Options Reconfigured:
cluster.choose-local: off
performance.strict-o-direct: on
network.remote-dio: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on
============================    

commands used to create sharded volume;


gluster volume create docker2-dio-shrd replica 3 arbiter 1
10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio-shrd
10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio-shrd
10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio-shrd
gluster volume set docker2-dio-shrd network.remote-dio off
gluster volume set docker2-dio-shrd performance.strict-o-direct on
gluster volume set docker2-dio-shrd cluster.choose-local off
gluster volume set docker2-dio-shrd features.shard on
gluster v start docker2-dio-shrd

gluster v info docker2-dio-shrd

Volume Name: docker2-dio-shrd
Type: Replicate
Volume ID: 50300e86-56d6-4e2d-be61-944f562e58f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.201.0.1:/data0/gfs/bricks/brick1/docker2-dio-shrd
Brick2: 10.201.0.5:/data0/gfs/bricks/brick1/docker2-dio-shrd
Brick3: 10.201.0.6:/data0/gfs/bricks/bricka/docker2-dio-shrd (arbiter)
Options Reconfigured:
features.shard: on
cluster.choose-local: off
performance.strict-o-direct: on
network.remote-dio: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on  
============================     
available space;

df -h|grep docker2-dio
10.201.0.5:/docker2-dio        5.4T  2.6T  2.9T  47% /mnt/docker2-dio
10.201.0.5:/docker2-dio-shrd   5.4T  2.6T  2.9T  47% /mnt/docker2-dio-shrd

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list