[Gluster-devel] xattrs and bug 9
tuulos at gmail.com
Sun Aug 16 04:12:30 UTC 2009
On Fri, 14 Aug 2009, Anand Avati wrote:
>> I second that question.
>> Extended attributes are pretty much critical for Disco. It uses them to
>> decide where to execute tasks, to optimize data locality:
>> If the extended attributes are really removed (I haven't upgraded yet to
>> 2.0.6), what's the official way of finding out where files are physically
> The reason we removed listing of Replicate's internal extended
> attribute records was because we found commands like 'rsync -X' would
> mess up and overwrite the extended attributes taking the filesystem to
> an inconsistent state.
> Ville, thanks for pointing that. We were not aware that these extended
> attributes had found a new purpose for themselves this way :-) They
> were not intended to be used this way at all. But for the same purpose
> what you are talking about, we have introduced the virtual extended
> attribute "trusted.glusterfs.location" which returns the hostname of
> the storage/posix volume on which the file resides. But, this feature
> is available only in mainline.
Great! I'll update our systems to the latest git snapshot.
> In fact the above patch was brought in with the intention of making
> GlusterFS fit into map/reduce frameworks nicely in the future. Now
> that you mention that this "feature" was already being used and got
> broken in 2.0.6 (which we were not aware), we'll get the "official
> way" of getting the hostname backported in 2.0.7. Note that the new
> method will return the server's hostname and not any volume name. So
> the gluster.py in disco.git might have to be modified to first look
> for this "official" xattr and then fail back to the old style.
Hostname is even better for us than the volume name. Now the user has to
provide a separate mapping for disco which maps volume names to hostnames.
> We also want feedback from you guys about if/how you want the location
> of file on multiple servers (for example Replicate could return
> multiple locations, and stripe has the content distributed across
> servers, possibly replicated as well). How and to what extent do the
> map/reduce frameworks make use of such information? does record-level
> location make sense at all?
Yes, we need locations of all replicas for each file. The current
mechanism lists all replicas for each input, so Disco can resort to
replicas if the master copy fails.
It would be great if trusted.glusterfs.location could return a list of
hostnames. The list should be ordered according to the Gluster's
preference to access the file, i.e. the second item should be the one that
Gluster uses in case that the master copy fails etc. This ensures that
Disco can preserve data locality even if individual volumes fail.
Striped files are not supported by Disco directly, so it doesn't do
anything clever with them (yet). In general being able to query as much
information as possible about files is beneficial.
It has been a deliberate choice to keep the storage layer separate from
Disco. An upside of this design decision is that you're free to choose the
best storage layer for your problem domain. For instance, I'm positive
that Gluster is a good match for many adhoc data analysis tasks and rapid
development in general. A downside is that coordination between the
storage layer and the computation layer isn't always optimal.
I became interested in Gluster because a custom translator seemed like a
reasonable way to bridge this gap. I was happy to notice that 95% of the
benefits could be achieved with default translators, without the burden of
maintaining a custom one.
I'm sure it'd benefit everybody if Gluster could continue supporting
systems on top of it with minimal hassle by exposing other ways to
interact(*) with the system than custom translators. With this respect,
extended attributes and things like libglusterfs are really welcome
(*) in addition to querying the status of glusterfs (e.g. using extended
attributes), it would be useful to _give_ information to Gluster as well.
For instance, now I have to run two GlusterFS in parallel (inputfs and
resultsfs in http://discoproject.org/doc/start/dfs.html), since only some
directories need to be replicated (input data) whereas others are used
over NUFA without replication (intermediate results). Disco could tag the
latter temporary files with a special extended attribute, or by making a
call to libglusterfs, so Gluster would know that replication is not
More information about the Gluster-devel