[Gluster-devel] Data classification infrastructure
Jeff Darcy
jdarcy at redhat.com
Fri Dec 5 17:52:51 UTC 2014
With the upcoming data compliance features in GlusterFS, a common
> infrastructure[1] to support various mechanisms such as tiering, bitrot
> detection etc. would prove to be helpful. Such an infrastructure extends
> the current changelog design (keeping NSR in mind) and removes
> constraints that limited it's adoption to a wide variety of use cases.
>
> The write up can be found here:
> https://gist.github.com/vshankar/346843ea529f3af35339
>
> Thanks to Kotresh and Joseph for spending time on this.
>
> Comments/suggestions are more than welcome.
It looks like a lot of work went into this. Kudos for that. Here are
some quick thoughts.
* I wouldn't worry too much about NSR in this design. NSR is evolving
toward a full-data-logging design. I don't think changelog should (or
is likely to) evolve in that same direction. As noted in the document,
NSR is also unique in other ways such as durability requirements, so I
think it makes sense to exclude it from the list of valid changelog use
cases.
* For putting changelog on its own SSD, how do the changelog translator
and libgfchangelog each know where that is? The first seems to be a
simple translator option. The second, and particularly coordination
between the two, might require a bit more effort.
* One of the key issues here is multiple consumers, particularly issues
such as backpressure and garbage collection in the presence of same.
* Is the LRU/LFU cache really part of changelog, or should it be
separate? Either way, we probably need a lot more detail to address
similar issues of currency, space usage, garbage collection, etc.
More information about the Gluster-devel
mailing list