ian.latter at midnightcode.org
Tue May 8 04:27:24 UTC 2012
> > Is there anything written up on why you/all want every
> > node to be completely conscious of every other node?
> > I could see a couple of architectures that might work
> > better (be more scalable) if the config minutiae were
> > either not necessary to be shared or shared in only
> > cases where the config minutiae were a dependency.
> Well, these aren't exactly minutiae. Everything at file
or directory level is
> fully distributed and will remain so. We're talking only
about stuff at the
> volume or server level, which is very little data but very
broad in scope.
> Trying to segregate that only adds complexity and
> compared to having it equally accessible to (or through)
Sorry, I didn't have time this morning to add more detail.
Note that my concern isn't bandwidth, its flexibility; the
less knowledge needed the more I can do crazy things
in user land, like running boxes in different data centres
and randomly power things up and down, randomly re-
address, randomly replace in-box hardware, load
balance, NAT, etc. It makes a dynamic environment
difficult to construct, for example, when Gluster rejects
the same volume-id being presented to an existing
cluster from a new GFID.
But there's no need to go even that complicated, let
me pull out an example of where shared knowledge
may be unnecessary;
The work that I was doing in Gluster (pre glusterd) drove
out one primary "server" which fronted a Replicate
volume of both its own Distribute volume and that of
another server or two - themselves serving a single
Distribute volume. So the client connected to one
server for one volume and the rest was black box /
magic (from the client's perspective - big fast storage
in many locations); in that case it could be said that
servers needed some shared knowledge, while the
The equivalent configuration in a glusterd world (from
my experiments) pushed all of the distribute knowledge
out to the client and I haven't had a response as to how
to add a replicate on distributed volumes in this model,
so I've lost replicate. But in this world, the client must
know about everything and the server is simply a set
of served/presented disks (as volumes). In this
glusterd world, then, why does any server need to
know of any other server, if the clients are doing all of
the heavy lifting?
The additional consideration is where the server both
consumes and presents, but this would be captured in
the client side view. i.e. given where glusterd seems
to be driving, this knowledge seems to be needed on
the client side (within glusterfs, not glusterfsd).
To my mind this breaks the gluster architecture that I
read about 2009, but I need to stress that I didn't get
a reply to the glusterd architecture question that I
posted about a month ago; so I don't know if glusterd
is currently limiting deployment options because;
- there is an intention to drive the heavy lifting to the
client (for example for performance reasons in big
- there are known limitations in the existing bricks/
modules (for example moving files thru distribute),
- there is ultimately (long term) more flexibility seen
in this model (and we're at a midway point between
pre glusterd and post so it doesn't feel that way
- there is an intent to drive out a particular market
outcome or match an existing storage model (the
gluster presentation was driving towards cloud,
and maybe those vendors don't use server side
As I don't have a clear/big picture in my mind; if I'm
not considering all of the impacts, then my apologies.
> > RE ZK, I have an issue with it not being a binary at
> > the linux distribution level. This is the reason I don't
> > currently have Gluster's geo replication module in
> > place ..
> What exactly is your objection to interpreted or JIT
> Performance? Security? It's an unusual position, to say
Specifically, primarily, space. Saturn builds GlusterFS
capacity from a 48 Megabyte Linux distribution and
adding many Megabytes of Perl and/or Python and/or
PHP and/or Java for a single script is impractical.
My secondary concern is licensing (specifically in the
Java run-time environment case). Hadoop forced my
hand; GNU's JRE/compiler wasn't up to the task of
running Hadoop when I last looked at it (about 2 or 3
years ago now) - well, it could run a 2007 or so
version but not current ones at that time - so now I
work with Gluster ..
Going back to ZkFarmer;
Considering other architectures; it depends on how
you slice and dice the problem as to how much
external support you need;
> I've long felt that our ways of dealing with cluster
> membership and staging of config changes is not
> quite as robust and scalable as we might want.
By way of example;
The openMosix kernel extensions maintained their
own information exchange between cluster nodes; if
a node (ip) was added via the /proc interface, it was
"in" the cluster. Therefore cluster membership was
It could be as simple as a text list on each node, or
it could be left to a user space daemon which could
then gate cluster membership - this suited everyone
with a small cluster.
The native daemon (omdiscd) used multicast
packets to find nodes and then stuff those IP's into
the /proc interface - this suited everyone with a
A colleague and I wrote a TCP variation to allow
multi-site discovery with SSH public key exchanges
and IPSEC tunnel establishment as part of the
gating process - this suited those with a distributed/
part-time cluster. To ZooKeeper's point
(http://zookeeper.apache.org/), the discovery
protocol that we created was weak and I've since
found a model/algorithm that allows for far more
The point being that, depending on the final cluster
architecture for gluster (i.e. all are nodes are peers
and thus all are cluster members, nodes are client
or server and both are cluster members, nodes are
client or server and only clients [or servers] are
cluster members, etc) there may be simpler cluster
management options ..
Late night coder ..
More information about the Gluster-devel