[Gluster-devel] Data classification proposal

Tue Jun 24 14:59:54 UTC 2014

> Its possible to express your example using lists if their entries are allowed
> to overlap. I see that you wanted a way to express a matrix (overlapping
> rules) with gluster's tree-like syntax as backdrop.
> 
> A polytree may be a better term than matrix (DAG without cycles), i.e. when
> there are overlaps a node in the graph gets multiple in-arcs.
> 
> Syntax aside, we seem to part on "where" to solve the problem- config file or
> UX. I prefer the UX have the logic to build the configuration file, given
> how complex it can be. My preference would be for the config file be mostly
> "read only" with extremely simple syntax.
> 
> I'll put some more thought into this and believe this discussion has
> illuminated some good points.
> 
> Brick: host1:/SSD1  SSD1
> Brick: host1:/SSD2  SSD2
> Brick: host2:/SSD3  SSD3
> Brick: host2:/SSD4  SSD4
> Brick: host1:/DISK1 DISK1
> 
> rule rack4:
>   select SSD1, SSD2, DISK1
> 
> # some files should go on ssds in rack 4
> rule A:
>   option filter-condition *.lock
>   select SSD1, SSD2
> 
> # some files should go on ssds anywhere
> rule B:
>   option filter-condition *.out
>   select SSD1, SSD2, SSD3, SSD4
> 
> # some files should go anywhere in rack 4
> rule C
>   option filter-condition *.c
>   select rack4
> 
> # some files we just don't care
> rule D
>   option filter-condition *.h
>   select SSD1, SSD2, SSD3, SSD4, DISK1
> 
> volume:
>   option filter-condition A,B,C,D

This seems to leave us with two options.  One option is that "select"
supports only explicit enumeration, so that adding a brick means editing
multiple rules that apply to it.  The other option is that "select"
supports wildcards.  Using a regex to match parts of a name is
effectively the same as matching the explicit tags we started with,
except that expressing complex Boolean conditions using a regex can get
more than a bit messy.  As Jamie Zawinski famously said:

> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.

I think it's nice to support regexes instead of plain strings in
lower-level rules, but relying on them alone to express complex
higher-level policies would IMO be a mistake.  Likewise, defining a
proper syntax for a config file seems both more flexible and easier than
defining one for a CLI, where the parsing options are even more limited.
What happens when someone wants to use Puppet (for example) to set this
up?  Then the user would express their will in Puppet syntax, which
would have to convert it to our CLI syntax, which would convert it to
our config-file syntax.  Why not allow them to skip a step where
information might get lost or mangled in translation?  We can still have
CLI commands to do the most common kinds of manipulation, as we do for
volfiles, but the final form can be more extensible.  It will still be
more comprehensible than Ceph's CRUSH maps.