[Gluster-users] file not found on DHT, but it exists
Arend-Jan Wijtzes
ajwytzes at wise-guys.nl
Wed Mar 31 11:49:58 UTC 2010
Hi gluster developers,
I have encountered a situation where a file can not be found,
but it does exist and it is on the correct node. The file can
be stat()-ed but not opened. After a Gluster restart the file
is accessable again.
Glusterfs: 3.0.3 with altered hashing function (by me).
== On the Gluster mounted volume:
archive at cgmarchive0:~/archive/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0
-rw-rw-r-- 1 archive archive 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0
archive at cgmarchive0:~/archive/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0
wc: www.funkyfish.nl#59493#cgmspider0: No such file or directory
== On the local (node0) volume
archive at cgmarchive0:/local.mnt/md0/glfs-data/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0
-rw-rw-r-- 1 vagabond vagabondo 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0
archive at cgmarchive0:/local.mnt/md0/glfs-data/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0
10767 www.funkyfish.nl#59493#cgmspider0
== Error log
[2010-03-31 12:02:47] D [dht-common.c:1590:dht_fd_cbk] dht: subvolume node0 returned -1 (No such file or directory)
[2010-03-31 12:02:47] W [fuse-bridge.c:858:fuse_fd_cbk] glusterfs-fuse: 10346982: OPEN() /incoming/www.funkyfish.nl#59493#cgmspider0 => -1 (No such file or directory)
Then after a gluster restart (umount/mount sequence):
== On the Gluster mounted volume:
archive at cgmarchive0:~/archive/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0
-rw-rw-r-- 1 archive archive 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0
archive at cgmarchive0:~/archive/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0
10767 www.funkyfish.nl#59493#cgmspider0
The application access pattern for these files is:
* a file is copied onto the filesystem with a temporary name
* the file is renamed to it's final name
* the file is read once, then deleted
* the filename is normally not used again or at least not any time soon
All file operations went through the gluster fs (no direct local access).
The hashing function has been replaced by one that implements a 'consistent
hashing' scheme and adapted so that the temporary filename and the final
filename always go to the same node.
The problem is not isolated to a single case, but it does take
a long time (days) to occur. In the long term it can be reproduced
so if you need more debugging info I can try to extract it for you.
Any ideas?
== Volume file
volume posix
type storage/posix
option directory /local.mnt/md0/glfs-data
end-volume
volume locks
type features/posix-locks
subvolumes posix
end-volume
volume fixed-id
type features/filter
option fixed-uid 2224
option fixed-gid 224
subvolumes locks
end-volume
volume brick
type performance/io-threads
subvolumes fixed-id
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow 10.0.0.*,10.1.0.*
subvolumes brick
end-volume
volume node0
type protocol/client
option transport-type tcp
option remote-host cgmarchive0
option remote-subvolume brick
end-volume
volume node1
type protocol/client
option transport-type tcp
option remote-host cgmarchive1
option remote-subvolume brick
end-volume
volume dht
type cluster/dht
subvolumes node0 node1
end-volume
--
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl
More information about the Gluster-users
mailing list