[Gluster-devel] dht: selfheal of missing directories on nameless (by GFID) LOOKUP

Niels de Vos ndevos at redhat.com
Sun May 4 16:22:13 UTC 2014


Hi,

bug 1093324 has been opened and we have identified the following cause:

1. an NFS-client does a LOOKUP of a directory on a volume
2. the NFS-client receives a filehandle (contains volume-id + GFID)
3. add-brick is executed, but the new brick does not have any 
   directories yet
4. the NFS-client creates a new file in the directory, this request is 
   in the format or <filehandle>/<filename>, <filehandle> was received 
   in step 2
5. the NFS-server does a LOOKUP on the parent directory identified by 
   the filehandle - nameless LOOKUP, only GFID is known
6. the old brick(s) return successfully
7. the new brick returns ESTALE
8. the NFS-server returns ESTALE to the NFS-client

In this case, the NFS-client should not receive an ESTALE. There is also 
no ESTALE error passed to the client when this procedure is done over 
FUSE or samba/libgfapi.

Selfhealing a directory entry based only on a GFID is not always 
possible. Files do not have a unique filename (hardlinks), so it is not 
trivial to find a filename for a GFID (expensive operation, and the 
result could be a list). However, for a directory this is simpler.  
A directory is not hardlink'd in the .glusterfs directory, directories 
are maintained as symbolic-links. This makes it possible to find the 
name of a directory, when only the GFID is known.

Currently DHT is not able to selfheal directories on a nameless LOOKUP.  
I think that it should be possible to change this, and to fix the ESTALE 
returned by the NFS-server.

At least two changes would be needed, and this is where I would like to 
hear opinions from others about it:

- The posix-xlator should be able to return the directory name when 
  a GFID is given. This can be part of the LOOKUP-reply (dict), and that 
  would add a readlink() syscall for each nameless LOOKUP that finds 
  a directory. Or (suggested by Pranith) add a virtual xattr and handle 
  this specific request with an additional FGETXATTR call.

- DHT should selfheal the directory when at least one ESTALE is returned 
  by the bricks. When all bricks return ESTALE, the ESTALE is valid and 
  should be passed on to the upper layers (NFS-server -> NFS-client).


I've added Venkatesh on CC, his patch http://review.gluster.org/74930 is 
in the review queue and seems to be a little related to this. Although 
that change does not address the problem in this email, as Susant (+CC) 
pointed out earlier today.

Thanks,
Niels



More information about the Gluster-devel mailing list