[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster
Xavier Hernandez
xhernandez at datalab.es
Thu Feb 6 08:54:03 UTC 2014
El 06/02/14 08:51, Vijay Bellur ha escrit:
> On 02/06/2014 05:46 AM, Ira Cooper wrote:
>> Yep... this is an area I am very interested in, going forwards.
>>
>> Especially sending messages back, we'll need that for any
>> caching/leasing/oplock/whatever we call it type protocols.
>
> +1. I am very interested in seeing this implemented.
>
> Xavi: Would you be able to attend next week's IRC meeting so that we
> can discuss this further? Of course, we can have out of band
> conversations but the IRC meeting might be a common ground for all
> interested folks to get together.
Of course, I'll be there.
Xavi
>
> -Vijay
>
>>
>> Keep me in the loop, and I'll keep tracking the list. (I'm already on
>> the list.)
>>
>> I'm also ira on freenode if you want to find me.
>>
>> Thanks,
>>
>> -Ira / ira@(samba.org <http://samba.org>|redhat.com <http://redhat.com>)
>>
>>
>> On Wed, Feb 5, 2014 at 6:24 PM, Anand Avati <avati at gluster.org
>> <mailto:avati at gluster.org>> wrote:
>>
>> Xavi,
>> Getting such a caching mechanism has several aspects. First of all
>> we need the framework pieces implemented (particularly server
>> originated messages to the client for invalidation and revokes) in a
>> well designed way. Particularly how we address a specific translator
>> in a message originating from the server. Some of the recent changes
>> to client_t allows for server-side translators to get a handle (the
>> client_t object) on which messages can be submitted back to the
>> client.
>>
>> Such a framework (of having server originated messages) is also
>> necessary for implementing oplocks (and possibly leases) -
>> particularly interesting for the Samba integration.
>>
>> As Jeff already mentioned, this is an area where gluster has not
>> focussed on, given the targeted use case. However the benefits of
>> extending this to internal use cases (to avoid per-operation
>> inodelks can benefit many modules - encryption/crypt, afr, etc.) It
>> seems possible to have a common framework for delegating locks to
>> clients, and build caching coherency protocols / oplocks / inodelk
>> avoidence on top of it.
>>
>> Feel free to share a more detailed proposal if you have have/plan -
>> I'm sure the Samba folks (Ira copied) would be interested too.
>>
>> Thanks!
>> Avati
>>
>>
>> On Wed, Feb 5, 2014 at 11:27 AM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>>
>> On 04.02.2014 17:18, Jeff Darcy wrote:
>>
>> The only synchronization point needed is to make sure
>> that all bricks
>> agree on the inode state and which client owns it. This
>> can be achieved
>> without locking using a method similar to what I
>> implemented in the DFC
>> translator. Besides the lock-less architecture, the main
>> advantage is
>> that much more aggressive caching strategies can be
>> implemented very
>> near to the final user, increasing considerably the
>> throughput of the
>> file system. Special care has to be taken with things
>> than can fail on
>> background writes (basically brick space and user access
>> rights). Those
>> should be handled appropiately on the client side to
>> guarantee future
>> success of writes. Of course this is only a high level
>> overview. A
>> deeper analysis should be done to see what to do on each
>> special case.
>> What do you think ?
>>
>>
>> I think this is a great idea for where we can go - and need
>> to go - in the
>> long term. However, it's important to recognize that it *is*
>> the long
>> term. We had to solve almost exactly the same problems in
>> MPFS long ago.
>> Whether the synchronization uses locks or not *locally* is
>> meaningless,
>> because all of the difficult problems have to do with
>> recovering the
>> *distributed* state. What happens when a brick fails while
>> holding an
>> inode in any state but I? How do we recognize it, what do we
>> do about it,
>> how do we handle the case where it comes back and needs to
>> re-acquire its
>> previous state? How do we make sure that a brick can
>> successfully flush
>> everything it needs to before it yields a
>> lock/lease/whatever? That's
>> going to require some kind of flow control, which is itself
>> a pretty big
>> project. It's not impossible, but it took multiple people
>> some years for
>> MPFS, and ditto for every other project (e.g. Ceph or
>> XtreemFS) which
>> adopted similar approaches. GlusterFS's historical avoidance
>> of this
>> complexity certainly has some drawbacks, but it has also
>> been key to us
>> making far more progress in other areas.
>>
>> Well, it's true that there will be a lot of tricky cases that
>> will need
>> to be handled to be sure that data integrity and system
>> responsiveness is
>> guaranteed, however I think that they are not more difficult
>> than what
>> can happen currently if a client dies or loses communication
>> while it
>> holds a lock on a file.
>>
>> Anyway I think there is a great potential with this mechanism
>> because it
>> can allow the implementation of powefull caches, even based on
>> SSD that
>> could improve the performance a lot.
>>
>> Of course there is a lot of work solving all potential
>> failures and
>> designing the right thing. An important consideration is that
>> all
>> these methods try to solve a problem that is seldom found (i.e.
>> having
>> more than one client modifying the same file at the same
>> time). So a
>> solution that has almost 0 overhead for the normal case and
>> allows the
>> implementation of aggressive caching mechanisms seems a big win.
>>
>>
>> To move forward on this, I think we need a *much* more
>> detailed idea of
>> how we're going to handle the nasty cases. Would some sort
>> of online
>> collaboration - e.g. Hangouts - make more sense than
>> continuing via
>> email?
>>
>> Of course, we can talk on irc or another place if you prefer
>>
>> Xavi
>>
>>
>> _________________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>> https://lists.nongnu.org/__mailman/listinfo/gluster-devel
>> <https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
More information about the Gluster-devel
mailing list