[Gluster-devel] xattrs and bug 9

Anand Avati avati at gluster.com
Fri Aug 14 20:08:16 UTC 2009


> I second that question.
>
> Extended attributes are pretty much critical for Disco. It uses them to
> decide where to execute tasks, to optimize data locality:
>
> http://github.com/tuulos/disco/blob/c1d4ffadeba40af8a8547dd6afce562d267e464e/pydisco/disco/dfs/gluster.py#L36
>
> If the extended attributes are really removed (I haven't upgraded yet to
> 2.0.6), what's the official way of finding out where files are physically
> stored?

The reason we removed listing of Replicate's internal extended
attribute records was because we found commands like 'rsync -X' would
mess up and overwrite the extended attributes taking the filesystem to
an inconsistent state.

Ville, thanks for pointing that. We were not aware that these extended
attributes had found a new purpose for themselves this way :-) They
were not intended to be used this way at all. But for the same purpose
what you are talking about, we have introduced the virtual extended
attribute "trusted.glusterfs.location" which returns the hostname of
the storage/posix volume on which the file resides. But, this feature
is available only in mainline.

http://git.gluster.com/?p=glusterfs.git;a=commit;h=5be3c142978257032bd11ad420382859fc204702

In fact the above patch was brought in with the intention of making
GlusterFS fit into map/reduce frameworks nicely in the future. Now
that you mention that this "feature" was already being used and got
broken in 2.0.6 (which we were not aware), we'll get the "official
way" of getting the hostname backported in 2.0.7. Note that the new
method will return the server's hostname and not any volume name. So
the gluster.py in disco.git might have to be modified to first look
for this "official" xattr and then fail back to the old style.

We also want feedback from you guys about if/how you want the location
of file on multiple servers (for example Replicate could return
multiple locations, and stripe has the content distributed across
servers, possibly replicated as well). How and to what extent do the
map/reduce frameworks make use of such information? does record-level
location make sense at all?

Thanks,
Avati





More information about the Gluster-devel mailing list