[Gluster-devel] [RFC] A new caching/synchronization mechanism to speed up gluster

Thu Feb 6 07:51:18 UTC 2014

On 02/06/2014 05:46 AM, Ira Cooper wrote:
> Yep... this is an area I am very interested in, going forwards.
>
> Especially sending messages back, we'll need that for any
> caching/leasing/oplock/whatever we call it type protocols.

+1. I am very interested in seeing this implemented.

Xavi: Would you be able to attend next week's IRC meeting so that we can 
discuss this further? Of course, we can have out of band conversations 
but the IRC meeting might be a common ground for all interested folks to 
get together.

-Vijay

>
> Keep me in the loop, and I'll keep tracking the list.  (I'm already on
> the list.)
>
> I'm also ira on freenode if you want to find me.
>
> Thanks,
>
> -Ira / ira@(samba.org <http://samba.org>|redhat.com <http://redhat.com>)
>
>
> On Wed, Feb 5, 2014 at 6:24 PM, Anand Avati <avati at gluster.org
> <mailto:avati at gluster.org>> wrote:
>
>     Xavi,
>     Getting such a caching mechanism has several aspects. First of all
>     we need the framework pieces implemented (particularly server
>     originated messages to the client for invalidation and revokes) in a
>     well designed way. Particularly how we address a specific translator
>     in a message originating from the server. Some of the recent changes
>     to client_t allows for server-side translators to get a handle (the
>     client_t object) on which messages can be submitted back to the client.
>
>     Such a framework (of having server originated messages) is also
>     necessary for implementing oplocks (and possibly leases) -
>     particularly interesting for the Samba integration.
>
>     As Jeff already mentioned, this is an area where gluster has not
>     focussed on, given the targeted use case. However the benefits of
>     extending this to internal use cases (to avoid per-operation
>     inodelks can benefit many modules - encryption/crypt, afr, etc.) It
>     seems possible to have a common framework for delegating locks to
>     clients, and build caching coherency protocols / oplocks / inodelk
>     avoidence on top of it.
>
>     Feel free to share a more detailed proposal if you have have/plan -
>     I'm sure the Samba folks (Ira copied) would be interested too.
>
>     Thanks!
>     Avati
>
>
>     On Wed, Feb 5, 2014 at 11:27 AM, Xavier Hernandez
>     <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>
>         On 04.02.2014 17:18, Jeff Darcy wrote:
>
>                 The only synchronization point needed is to make sure
>                 that all bricks
>                 agree on the inode state and which client owns it. This
>                 can be achieved
>                 without locking using a method similar to what I
>                 implemented in the DFC
>                 translator. Besides the lock-less architecture, the main
>                 advantage is
>                 that much more aggressive caching strategies can be
>                 implemented very
>                 near to the final user, increasing considerably the
>                 throughput of the
>                 file system. Special care has to be taken with things
>                 than can fail on
>                 background writes (basically brick space and user access
>                 rights). Those
>                 should be handled appropiately on the client side to
>                 guarantee future
>                 success of writes. Of course this is only a high level
>                 overview. A
>                 deeper analysis should be done to see what to do on each
>                 special case.
>                 What do you think ?
>
>
>             I think this is a great idea for where we can go - and need
>             to go - in the
>             long term. However, it's important to recognize that it *is*
>             the long
>             term. We had to solve almost exactly the same problems in
>             MPFS long ago.
>             Whether the synchronization uses locks or not *locally* is
>             meaningless,
>             because all of the difficult problems have to do with
>             recovering the
>             *distributed* state. What happens when a brick fails while
>             holding an
>             inode in any state but I? How do we recognize it, what do we
>             do about it,
>             how do we handle the case where it comes back and needs to
>             re-acquire its
>             previous state? How do we make sure that a brick can
>             successfully flush
>             everything it needs to before it yields a
>             lock/lease/whatever? That's
>             going to require some kind of flow control, which is itself
>             a pretty big
>             project. It's not impossible, but it took multiple people
>             some years for
>             MPFS, and ditto for every other project (e.g. Ceph or
>             XtreemFS) which
>             adopted similar approaches. GlusterFS's historical avoidance
>             of this
>             complexity certainly has some drawbacks, but it has also
>             been key to us
>             making far more progress in other areas.
>
>         Well, it's true that there will be a lot of tricky cases that
>         will need
>         to be handled to be sure that data integrity and system
>         responsiveness is
>         guaranteed, however I think that they are not more difficult
>         than what
>         can happen currently if a client dies or loses communication
>         while it
>         holds a lock on a file.
>
>         Anyway I think there is a great potential with this mechanism
>         because it
>         can allow the implementation of powefull caches, even based on
>         SSD that
>         could improve the performance a lot.
>
>         Of course there is a lot of work solving all potential failures and
>         designing the right thing. An important consideration is that all
>         these methods try to solve a problem that is seldom found (i.e.
>         having
>         more than one client modifying the same file at the same time). So a
>         solution that has almost 0 overhead for the normal case and
>         allows the
>         implementation of aggressive caching mechanisms seems a big win.
>
>
>             To move forward on this, I think we need a *much* more
>             detailed idea of
>             how we're going to handle the nasty cases. Would some sort
>             of online
>             collaboration - e.g. Hangouts - make more sense than
>             continuing via
>             email?
>
>         Of course, we can talk on irc or another place if you prefer
>
>         Xavi
>
>
>         _________________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>         https://lists.nongnu.org/__mailman/listinfo/gluster-devel
>         <https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>