[Gluster-devel] Questions about DHT

Anand Avati avati at zresearch.com
Wed Dec 10 19:14:08 UTC 2008


Kevan,
 I'll try to answer your questions and eventually include these points
in the wiki.

> 1) What happens when a node is added to an existing DHT volume with files
> present?
>
> I would think that if you have 4 subvolumes in a DHT, and add a fifth
> volume, approximately 20% of the requests for existing files would be mapped
> to that new sub volume, and fail on request.

Each directory remembers the keyspace layout on the servers in the
extended attributes of each directory. So when you add a fifth volume,
the lookups continue to keep hitting the right (old) subvolumes. Newly
created directories will get a lot of hash weightage to the fifth
directory.


>
> 2) If a file exists on a DHT sub-volume that it shouldn't be mapped to (such
> as in question #1 after adding a volume), would an ls on the container
> directory (if it was correctly mapped) return data for that file?
>
> e.g.
> Glusterfs mount of /mnt/gluster/
> DHT sub volume 1 contains /mnt/gluster/testdir/,
> /mnt/gluster/testdir/testfile1.dat, /mnt/gluster/testdir/testfile2.dat
> of which /mnt/gluster/testdir/testfile2.dat should be on DHT sub volume #2,
> but is not (DHT sub volume #2 added after files created on sub volume #1).
>  What does "ls /mnt/gluster/testdir/" show?  What does "ls
> /mnt/gluster/testdir/testfile2.dat" return?

As I explained above, /mnt/gluster/testdir 'remembers' the layout at
the time of its mkdir in its extended attributes, so testfile2.dat
will be looked up on subvolume #1.


>
> 3) It seems some sort of fallback to unify like file access (for read) would
> fix most these issues.  Does that happen?  Has it been considered or is it
> in planning?  I imagine a single sub volume request utilizing DHT info
> followed up by a unify style request on failure (that is, request info from
> all sub-volumes) would allow for quick DHT access in correctly distributed
> systems with correct (but slower fallback behavior).  Caching of secondary
> requests (unify type) at the client level for both success and failure with
> info on which sub volume to access could speed up this as well.
>


there is a 'unify-like' fallback mode in DHT if you set 'option
lookup-unhashed on'. In this mode, a file is first looked up in the
subvolume where it is supposed to be. If it is found, everything is
fine. If the file does not exist there, it broadcasts a search to all
servers and sets up 'pointer files' (it is like a symlink across
subvolumes which DHT understands) so that the file is looked up
rightly next time.

The disadvantages of 'option lookup-unhashed on' are -
1. the perf hit on looking up non existant files are a lot higher
(imagine rsync'ing a tree)
2. the 'unhashed' files are not 'listed' in an ls command, you somehow
would have to stat/lookup the filenames which are not listed. Once
looked up, further ls calls will list the entry. This mode is useful
for writing 'migration scripts' from unify to dht. There will be a
section on migrating from unify to dht in the wiki which will cover
this point.

Apologies for the slow pace in which documentation is being updated.
All the developers are totally involved in QA at the moment. We will
finish it before making the 1.4.0 release.

avati





More information about the Gluster-devel mailing list