[Gluster-devel] Data classification infrastructure

Fri Dec 5 17:52:51 UTC 2014

 With the upcoming data compliance features in GlusterFS, a common
> infrastructure[1] to support various mechanisms such as tiering, bitrot
> detection etc. would prove to be helpful. Such an infrastructure extends
> the current changelog design (keeping NSR in mind) and removes
> constraints that limited it's adoption to a wide variety of use cases.
> 
> The write up can be found here:
> https://gist.github.com/vshankar/346843ea529f3af35339
> 
> Thanks to Kotresh and Joseph for spending time on this.
> 
> Comments/suggestions are more than welcome.

It looks like a lot of work went into this.  Kudos for that.  Here are
some quick thoughts.

* I wouldn't worry too much about NSR in this design.  NSR is evolving
toward a full-data-logging design.  I don't think changelog should (or
is likely to) evolve in that same direction.  As noted in the document,
NSR is also unique in other ways such as durability requirements, so I
think it makes sense to exclude it from the list of valid changelog use
cases.

* For putting changelog on its own SSD, how do the changelog translator
and libgfchangelog each know where that is?  The first seems to be a
simple translator option.  The second, and particularly coordination
between the two, might require a bit more effort.

* One of the key issues here is multiple consumers, particularly issues
such as backpressure and garbage collection in the presence of same.

* Is the LRU/LFU cache really part of changelog, or should it be
separate?  Either way, we probably need a lot more detail to address
similar issues of currency, space usage, garbage collection, etc.