[Gluster-devel] The Performance Dataset Problem and the [RFC] Filetree scheme microformat
Csaba Henk
csaba at redhat.com
Wed May 29 05:36:44 UTC 2013
Hi All,
I wonder who is aware of the thing we call the Performance Dataset Problem.
Well I guess everyone would be aware of the problem itself, just not necessarily
familiar with this coinage :)
The behavior of Glusterfs and related utilities (eminemtly, geo-rep) might quite
well be affected by the content of the volume. Eg., having *many* small files is
a pattern that tends to defeat general optimization efforts, and needs some dedicated
care to be good at it; or having a deep and branchy directory structure provokes
quite differenti behavioral patterns than a flat hierarchy.
So for proper testing of the software, we need a large variety of filetree layouts
and content. Problem is that these are hard to describe and hard to produce, and
to worsen it, these two come hand in hand. Because what can one do when one wants to
be more specific than saying these vague things like "small files", "deep and branchy",
"flat"? Well, mostly s/he will end up with coding up a script that performs the creation
of the hiearchy in mind. That's an engineering effort which definitely gets at a precise
description; but a highly ineffective effort -- the resulting code will be full of ugly
loops and recursive constructs and various ad-hoc naming and numeric parameters.
Quite hard to distill the idea by the reader (best s/he can do is to run it and then
run find(1) on the created tree), quite specific, non-reusable imperative code for
the creation.
But we could do better. The idea of some filetree layout could be possible to communicate
clearly, had we a language that empowers us to speak about it. The creation of a filetree
of some particular layout could be performed by a general utility, had we a way to specify
that layout.
The language to specify filetree layouts -- that's what I made an attempt on.
We need to get it right, before moving on to the tool that creates file trees on its terms.
We have some conflicting perspectives -- the language should be
- human friendly
- machine friendly
- compact
- versatile
I made up my mind to find a good trade-off and came up with this:
https://gist.github.com/csabahenk/5668160
Here is a printable version for the current version of the document:
https://dl.dropboxusercontent.com/u/27330206/filetree-scheme.pdf
Please comment.
Thanks
Csaba
More information about the Gluster-devel
mailing list