[Gluster-devel] Data classification proposal

Mon Jun 23 22:16:52 UTC 2014

Rather than using the keyword "unclaimed", my instinct was to explicitly list which bricks have not been "claimed".  Perhaps you have something more subtle in mind, it is not apparent to me from your response. Can you provide an example of why it is necessary and a list could not be provided in its place? If the list is somehow "difficult to figure out", due to a particularly complex setup or some such, I'd prefer a CLI/GUI build that list rather than having sysadmins hand-edit this file.

The key-value piece seems like syntactic sugar - an "alias". If so, let the name itself be the alias. No notions of SSD or physical location need be inserted. Unless I am missing that it *is* necessary, I stand by that value judgement as a philosophy of not putting anything into the configuration file that you don't require. Can you provide an example of where it is necessary?

As to your point on filtering (which files go into which tier/group). I wrote a little further in the email that I do not see a way around regular expressions within the filter-condition keyword. My understanding of your proposal is the select statement did not do file name filtering, the "filter-condition" option did. I'm ok with that.

As far as the "user stories" idea goes, that seems like a good next step.

----- Original Message -----
From: "Jeff Darcy" <jdarcy at redhat.com>
To: "Dan Lambright" <dlambrig at redhat.com>
Cc: "Gluster Devel" <gluster-devel at gluster.org>
Sent: Monday, June 23, 2014 5:24:14 PM
Subject: Re: [Gluster-devel] Data classification proposal

> A frustrating aspect of Linux is the complexity of /etc configuration file's
> formats (rsyslog.conf, logrotate, cron, yum repo files, etc) In that spirit
> I would simplify the "select" in the data classification proposal (copied
> below) to only accept a list of bricks/sub-tiers with wild-cards '*', rather
> than full-blown regular expressions or key/value pairs.

Then how does *the user* specify which files should go into which tier/group?
If we don't let them specify that in configuration, then it can only be done
in code and we've taken a choice away from them.

> I would drop the
> "unclaimed" keyword

Then how do you specify any kind of default rule for files not matched
elsewhere?  If certain files can be placed only in certain locations due to
security or compliance considerations, how would they specify the location(s)
for files not subject to any such limitation?

> and not have keywords "media type", and "rack". It does
> not seem necessary to introduce new keys for the underlying block device
> type (SSD vs disk) any more than we need to express the filesystem (XFS vs
> ext4).

The idea is to let users specify whatever criteria matter *to them*; media
type and rack/row are just examples to get them started.

> In other words, I think tiering can be fully expressed in the
> configuration file while still abstracting the underlying storage.

Yes, *tiering* can be expressed using a simpler syntax.  I was trying for
something that could also support placement policies other than strict
linear "above" vs. "below" with only the migration policies we've written
into code.

> That
> said, the configuration file could be built up by a CLI or GUI, and richer
> expressibility could exist at that level.
> 
> example:
> 
> brick host1:/brick ssd-group0-1
> 
> brick host2:/brick ssd-group0-2
> 
> brick host3:/brick disk-group0-1
> 
> rule tier-1
>   	select ssd-group0*
> 
> rule tier-2
> 	select disk-group0
> 
> rule all
> 	select tier-1
> 	# use repeated "select" to establish order
> 	select tier-2
> 	type features/tiering
> 
> The filtering option's regular expressions seem hard to avoid. If just the
> name of the file satisfies most use cases (that we know of?) I do not think
> there is any way to avoid regular expressions in the option for filters.
> (Down the road, if we were to allow complete flexibility in how files can be
> distributed across subvolumes, the filtering problems may start to look
> similar to 90s-era packet classification with a solution along the lines of
> the Berkeley packet filter.)
> 
> There may be different rules by which data is distributed at the "tiering"
> level. For example, one tiering policy could be the fast tier (first
> listed). It would be a "cache" for the slow tier (second listed). I think
> the "option" keyword could handle that.
> 
> rule all
>  	select tier-1
> 	 # use repeated "select" to establish order
>  	select tier-2
>  	type features/tiering
> 	option tier-cache, mode=writeback, dirty-watermark=80
> 
> Another example tiering policy could be based on compliance ; when a file
> needs to become read-only, it moves from the first listed tier to the
> second.
> 
> rule all
>  	 select tier-1
> 	 # use repeated "select" to establish order
> 	 select tier-2
> 	 type features/tiering
>  	option tier-retention

OK, good so far.  How would you handle the "replica 2.5" sanlock case with
the simplified syntax?  Or security-aware placement equivalent to this?

   rule secure
      select brick-0-*
      option encryption on

   rule insecure
      select brick-1-*
      option encryption off

   rule all
      select secure
      select insecure
      type features/filter
      option filter-condition-1 security-level:high
      option filter-target-1 secure
      option default-subvol insecure

In true agile fashion, maybe we should compile a set of "user stories" and
treat those as test cases for any proposed syntax.  That would need to
include at least

   * hot/cold tiering

   * HIPPA/EUPD style compliance (file must *always* or *never* be in X)

   * security-aware placement

   * multi-tenancy

   * sanlock case

I'm not trying to create complexity for its own sake.  If there's a
simpler syntax that doesn't eliminate some of these cases in favor of
tiering and nothing else, that would be great.

> ----- Original Message -----
> From: "Jeff Darcy" <jdarcy at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Friday, May 23, 2014 3:30:39 PM
> Subject: [Gluster-devel] Data classification proposal
> 
> One of the things holding up our data classification efforts (which include
> tiering but also other stuff as well) has been the extension of the same
> conceptual model from the I/O path to the configuration subsystem and
> ultimately to the user experience.  How does an administrator define a
> tiering policy without tearing their hair out?  How does s/he define a mixed
> replication/erasure-coding setup without wanting to rip *our* hair out?  The
> included Markdown document attempts to remedy this by proposing one out of
> many possible models and user interfaces.  It includes examples for some of
> the most common use cases, including the "replica 2.5" case we'e been
> discussing recently.  Constructive feedback would be greatly appreciated.
> 
> 
> 
> # Data Classification Interface
> 
> The data classification feature is extremely flexible, to cover use cases
> from
> SSD/disk tiering to rack-aware placement to security or other policies.  With
> this flexibility comes complexity.  While this complexity does not affect the
> I/O path much, it does affect both the volume-configuration subsystem and the
> user interface to set placement policies.  This document describes one
> possible
> model and user interface.
> 
> The model we used is based on two kinds of information: brick descriptions
> and
> aggregation rules.  Both are contained in a configuration file (format TBD)
> which can be associated with a volume using a volume option.
> 
> ## Brick Descriptions
> 
> A brick is described by a series of simple key/value pairs.  Predefined keys
> include:
> 
>  * **media-type**
>    The underlying media type for the brick.  In its simplest form this might
>    just be *ssd* or *disk*.  More sophisticated users might use something
>    like
>    *15krpm* to represent a faster disk, or *perc-raid5* to represent a brick
>    backed by a RAID controller.
> 
>  * **rack** (and/or **row**)
>    The physical location of the brick.  Some policy rules might be set up to
>    spread data across more than one rack.
> 
> User-defined keys are also allowed.  For example, some users might use a
> *tenant* or *security-level* tag as the basis for their placement policy.
> 
> ## Aggregation Rules
> 
> Aggregation rules are used to define how bricks should be combined into
> subvolumes, and those potentially combined into higher-level subvolumes, and
> so
> on until all of the bricks are accounted for.  Each aggregation rule consists
> of the following parts:
> 
>  * **id**
>    The base name of the subvolumes the rule will create.  If a rule is
>    applied
>    multiple times this will yield *id-0*, *id-1*, and so on.
> 
>  * **selector**
>    A "filter" for which bricks or lower-level subvolumes the rule will
>    aggregate.  This is an expression similar to a *WHERE* clause in SQL,
>    using
>    brick/subvolume names and properties in lieu of columns.  These values are
>    then matched against literal values or regular expressions, using the
>    usual
>    set of boolean operators to arrive at a *yes* or *no* answer to the
>    question
>    of whether this brick/subvolume is affected by this rule.
> 
>  * **group-size** (optional)
>    The number of original bricks/subvolumes to be combined into each produced
>    subvolume.  The special default value zero means to collect all original
>    bricks or subvolumes into one final subvolume.  In this case, *id* is used
>    directly instead of having a numeric suffix appended.
> 
>  * **type** (optional)
>    The type of the generated translator definition(s).  Examples might
>    include
>    "AFR" to do replication, "EC" to do erasure coding, and so on.  The more
>    general data classification task includes the definition of new
>    translators
>    to do tiering and other kinds of filtering, but those are beyond the scope
>    of this document.  If no type is specified, cluster/dht will be used to do
>    random placement among its constituents.
> 
>  * **tag** and **option** (optional, repeatable)
>    Additional tags and/or options to be applied to each newly created
>    subvolume.  See the "replica 2.5" example to see how this can be used.
> 
> Since each type might have unique requirements, such as ensuring that
> replication is done across machines or racks whenever possible, it is assumed
> that there will be corresponding type-specific scripts or functions to do the
> actual aggregation.  This might even be made pluggable some day (TBD).  Once
> all rule-based aggregation has been done, volume options are applied
> similarly
> to how they are now.
> 
> Astute readers might have noticed that it's possible for a brick to be
> aggregated more than once.  This is intentional.  If a brick is part of
> multiple aggregates, it will be automatically split into multiple bricks
> internally but this will be invisible to the user.
> 
> ## Examples
> 
> Let's start with a simple tiering example.  Here's what the
> data-classification
> config file might look like.
> 
> 	brick host1:/brick
> 		media-type = ssd
> 
> 	brick host2:/brick
> 		media-type = disk
> 
> 	brick host3:/brick
> 		media-type = disk
> 
> 	rule tier-1
> 		select media-type = ssd
> 
> 	rule tier-2
> 		select media-type = disk
> 
> 	rule all
> 		select tier-1
> 		# use repeated "select" to establish order
> 		select tier-2
> 		type features/tiering
> 
> This would create a DHT subvolume name *tier-2* for the bricks on *host2* and
> *host3*.  Then it would add a features/tiering translator to treat *tier-1*
> as
> its upper tier and *tier-2* as its lower.  Here's a more complex example that
> adds replication and erasure coding to the mix.
> 
> 	# Assume 20 hosts, four fast and sixteen slow (named appropriately).
> 
> 	rule tier-1
> 		 select *fast*
> 		group-size 2
> 		type cluster/afr
> 
> 	rule tier-2
> 		# special pattern matching otherwise-unused bricks
> 		select %{unclaimed}
> 		group-size 8
> 		type cluster/ec parity=2
> 		# i.e. two groups, each six data plus two parity
> 
> 	rule all
> 		select tier-1
> 		select tier-2
> 		type features/tiering
> 
> Lastly, here's an example of "replica 2.5" to do three-way replication for
> some
> files but two-way replication for the rest.
> 
> 	rule two-way-parts
> 		select *
> 		group-size 2
> 		type cluster/afr
> 
> 	rule two-way-pool
> 		select two-way-parts*
> 		tag special=no
> 
> 	rule three-way-parts
> 		# use overlapping selections to demonstrate splitting
> 		select *
> 		group-size 3
> 		type cluster/afr
> 
> 	rule three-way-pool
> 		select three-way-parts*
> 		tag special=yes
> 
> 	rule sanlock
> 		select two-way*
> 		select three-way*
> 		type features/filter
> 		# files named *.lock go in the replica-3 pool
> 		option filter-condition-1 name:*.lock
> 		option filter-target-1 three-way-pool
> 		# everything else goes in the replica-2 pool
> 		option default-subvol two-way-pool
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>