[Gluster-devel] Local vs unify

Mon Apr 28 10:20:33 UTC 2008

On Mon, 28 Apr 2008, Paul Arch wrote:

>>> Thanks for supporting our design. We are working towards fixing those few
> glitches!
>>> But true that things are not changing as fast as we have wished. Each new
> idea needs
>>> time to get converted into code, get tested. Hence bit delay in things.
>>
>> No problem and thank you for this email, it has answered a major issue
>> for me .. I am of course going to ask;
>> a. Any timescale on the metadata changes?
>> b. How much of a difference will it make.. will we be approaching
>> local(ish) speeds .. or are we just talking x2 of current?
>
>> I imagine that would depend on the metadata expiry timeouts. If it's set
>> to 100ms, the chances are that you won't see much improvement. If it's set
>> for 100 seconds, it'll go as fast as local FS for cached data but you'll
>> be working on FS state that might as well be imaginary in some cases. No
>> doubt someone will then complain about the fact that posix semantics no
>> longer work.
>
> <snip>
>
> I have been following this thread and the metadata stuff does interest me -
> we have millions and millions of small files.
>
> In the above situation though, I would of thought knowing all of the inputs
> into the system ( ie - gluster knows that state everything is in, as long as
> no-one enters and changes things from outside of the mechanism in the
> back-ground ) could see some fair potential for caching the meta data.  If
> the system is in a degraded state sure you wouldn't and shouldn't trust this
> cache, but all things being equal and happy, why can't we trust a good sized
> cache metadata is AFR/unity/whatever is reporting the system is happy and
> operational ?

This relates to the point I made a few days ago on the other thread. You 
_could_ do this, but in order to do that, you'd have to change the 
sync-on-read paradigm and couple the systems much more tightly. This would 
likely involve things like mandatory fencing requirements which are 
currently avoided.

If you have a read-lock on a file, you cannot get a write-lock on it, so 
you could potentially sacrifice write-lock performance for read-locking in 
that case, by making read-locks always available without external checking 
against other nodes unless a write lock is in place (which needs to be 
broadcast and acknowledged by _all_ nodes in the cluster).

This is also made more difficult with unify or striping because the data 
is remote in the first place, so you have to retrieve the metadata at 
least from the server - unless you want to cache it locally, which would 
gain break posix semantics.

Note - NFS is not posix. You can set metadata cache expiry on NFS. NFS 
also has the advantage that the data is on _one_ server, so even if 
there was some form of locking that reliably works over NFS available 
(there isn't, but for the sake of the argument, if there was) there would 
still be no concept of chasing locks across the cluster to make sure the 
mirrors are consistent before granting them.

In short - comparing NFS to GlusterFS isn't really meaningful.

Gordan