[Gluster-users] Fwd: Self-healing fails to create missing directories (2.0.1)

Ville Tuulos tuulos at gmail.com
Wed Jun 10 07:51:08 UTC 2009


Hi,

First, thanks for an awesome project! I've been evaluating how Gluster
could work as a storage layer for Disco, an open-source map/reduce
framework (http://discoproject.org).

I'm running a snapshot from the Git tree at commit 5c1d9108c1 (2.0.1).
I have a client-side distributed and replicated setup similar to

http://www.gluster.org/docs/index.php/Mixing_DHT_and_AFR

I wanted to test how Gluster survives from a disk crash. I created a
directory hierarchy with some files which were replicated to nodes as
expected. Then I unmounted gluster on one of the servers and deleted
the data directory.

I re-created a new, empty data directory and re-mounted the gluster
node. I assumed that self-healing would recover the directory
hierarchy after stat'ing the files. However, gluster seems to be
unable to create missing directories. This is what I get in my logs:


[2009-06-01 00:13:33] D [server-dentry.c:395:do_path_lookup] vol1:
resolved path(/juus2/b/juus2) to (nil)(0)/(nil)(0)
[2009-06-01 00:13:33] D [server-dentry.c:321:__do_path_resolve] vol1:
resolved path(/juus2/b/juus2) till 712711(/juus2). sending lookup for
remaining path
[2009-06-01 00:13:33] D [server-protocol.c:3353:server_stub_resume]
server: 8378: INODELK (/juus2/b/juus2) on vol1 returning error: -1 (2)
[2009-06-01 00:13:33] D
[afr-self-heal-data.c:181:afr_sh_data_unlck_cbk] nx10-vol1-repl:
locking inode of /juus2/b/juus2 on child 1 failed: No such file or
directory
[2009-06-01 00:13:43] D [server-dentry.c:395:do_path_lookup] vol1:
resolved path(/juus2/b/juus2) to (nil)(0)/(nil)(0)
[2009-06-01 00:13:43] D [server-dentry.c:321:__do_path_resolve] vol1:
resolved path(/juus2/b/juus2) till 712711(/juus2). sending lookup for
remaining path
[2009-06-01 00:13:43] D [server-protocol.c:2633:server_stub_resume]
server: 823: LOOKUP (/juus2/b/juus2) on vol1 returning error: -1 (2)
[2009-06-01 00:13:43] D [client-protocol.c:2980:client_entrylk]
nx12-vol1: ENTRYLK 42763093 (/juus2/b): failed to get remote inode
number
[2009-06-01 00:13:43] D [server-dentry.c:395:do_path_lookup] vol1:
resolved path(/juus2/b) to 0x7fd79403ad30(712711)/(nil)(0)
[2009-06-01 00:13:43] D [server-protocol.c:3325:server_stub_resume]
server: 825: ENTRYLK (/juus2/b) on vol1 for key juus2 returning error:
-1 (2)
[2009-06-01 00:13:43] D
[afr-self-heal-common.c:1252:sh_missing_entries_lk_cbk]
nx10-vol1-repl: locking inode of /juus2/b/juus2 on child -1945757088
failed: No such file or directory
[2009-06-01 00:13:43] D [client-protocol.c:2980:client_entrylk]
nx12-vol1: ENTRYLK 42763093 (/juus2/b): failed to get remote inode
number
[2009-06-01 00:13:43] D [server-dentry.c:395:do_path_lookup] vol1:
resolved path(/juus2/b) to 0x7fd79403ad30(712711)/(nil)(0)
[2009-06-01 00:13:43] D [server-protocol.c:3325:server_stub_resume]
server: 826: ENTRYLK (/juus2/b) on vol1 for key juus2 returning error:
-1 (2)
[2009-06-01 00:13:43] D [client-protocol.c:2808:client_inodelk]
nx12-vol1: INODELK 96583929 (/juus2/b/juus2): failed to get remote
inode number


(I'm trying to read a file at /juus2/b/juus2 above which was
replicated on the node before the reset)

Not a single file is synced to the recovered node but I can read the
files ok from other replicas. However, if I create the missing
directories manually in the data directory, the files get synced
correctly.

A nasty implication of this bug is that when I create a new file in
the directory hierarchy, its replication fails on the recovered node
due to the missing directories and I end up getting fewer replicas of
the file than what I have specified in the volume config. The failure
happens silently as I can still read the file from other replicas, so
I might silently lose all my replicas without any sign until the last
replica fails.

Is this a known issue and are there any workarounds?

Thanks a lot for your help,

Ville




More information about the Gluster-users mailing list