[Gluster-users] pre-existing data

NovA av.nova at gmail.com
Sun Feb 22 13:37:19 UTC 2009


I'm also interested in the problem you mentioned. It stops me from
migrating to DHT...

There was a discussion about it some time ago. Here is a citation:
Anand Avati <avati at zresearch.com>
10 Dec 2008

there is a 'unify-like' fallback mode in DHT if you set 'option
lookup-unhashed on'. In this mode, a file is first looked up in the
subvolume where it is supposed to be. If it is found, everything is
fine. If the file does not exist there, it broadcasts a search to all
servers and sets up 'pointer files' (it is like a symlink across
subvolumes which DHT understands) so that the file is looked up
rightly next time.

The disadvantages of 'option lookup-unhashed on' are -
1. the perf hit on looking up non existant files are a lot higher
(imagine rsync'ing a tree)
2. the 'unhashed' files are not 'listed' in an ls command, you somehow
would have to stat/lookup the filenames which are not listed. Once
looked up, further ls calls will list the entry. This mode is useful
for writing 'migration scripts' from unify to dht. There will be a
section on migrating from unify to dht in the wiki which will cover
this point.

Since then I asked couple of times about the migration, but the
messages have been unnoticed in the mail-list.

Best regards,

2009/2/12 Zynovyev, Mykhaylo <M.Zynovyev at gsi.de>:
> in the GlusterFS version 1.3 I was using Unify translator with NUFA
> scheduler
> to setup HPC cluster, where each node exported some data, and at the same
> time was able to access
> the aggregated data from other nodes via mounted GlusterFS partition.
> With such a setup I was able to dynamically start GlusterFS on different
> nodes with pre-existing
> data, run "find . > /dev/null", and see the data on glusterfs partition.
> If the amount of files was huge (9 nodes sharing 100.000 files) it took a
> long time to complete ls command.
> As I see from version 2.0, I am not able to use more advanced from
> performance perspective DHT or NUFA translators for such a scenario,
> because pre-existing data is not visible. The only data is going to be
> visible is the data written
> through GlusterFS interface. Please, confirm my observations.
> Do I have to stick to unify translator with nufa scheduler from the previous
> version from now on?
> Is it going to be supported?
> Is there any development expected in this direction?

More information about the Gluster-users mailing list