[Gluster-devel] ZkFarmer

Tue May 8 04:27:24 UTC 2012

> > Is there anything written up on why you/all want every
> > node to be completely conscious of every other node?
> > 
> > I could see a couple of architectures that might work
> > better (be more scalable) if the config minutiae were 
> > either not necessary to be shared or shared in only 
> > cases where the config minutiae were a dependency.
> 
> Well, these aren't exactly minutiae.  Everything at file
or directory level is
> fully distributed and will remain so.  We're talking only
about stuff at the
> volume or server level, which is very little data but very
broad in scope.
> Trying to segregate that only adds complexity and
subtracts convenience,
> compared to having it equally accessible to (or through)
any server.

Sorry, I didn't have time this morning to add more detail.

Note that my concern isn't bandwidth, its flexibility; the
less knowledge needed the more I can do crazy things 
in user land, like running boxes in different data centres 
and randomly power things up and down, randomly re-
address, randomly replace in-box hardware, load 
balance, NAT, etc.  It makes a dynamic environment 
difficult to construct, for example, when Gluster rejects
the same volume-id being presented to an existing 
cluster from a new GFID.

But there's no need to go even that complicated, let 
me pull out an example of where shared knowledge 
may be unnecessary;

The work that I was doing in Gluster (pre glusterd) drove
out one primary "server" which fronted a Replicate 
volume of both its own Distribute volume and that of 
another server or two - themselves serving a single 
Distribute volume.  So the client connected to one 
server for one volume and the rest was black box /
magic (from the client's perspective - big fast storage
in many locations); in that case it could be said that 
servers needed some shared knowledge, while the 
clients didn't.

The equivalent configuration in a glusterd world (from
my experiments) pushed all of the distribute knowledge 
out to the client and I haven't had a response as to how
to add a replicate on distributed volumes in this model, 
so I've lost replicate.  But in this world, the client must 
know about everything and the server is simply a set
of served/presented disks (as volumes).  In this 
glusterd world, then, why does any server need to 
know of any other server, if the clients are doing all of 
the heavy lifting?  

The additional consideration is where the server both
consumes and presents, but this would be captured in
the client side view.  i.e. given where glusterd seems
to be driving, this knowledge seems to be needed on 
the client side (within glusterfs, not glusterfsd).

To my mind this breaks the gluster architecture that I
read about 2009, but I need to stress that I didn't get 
a reply to the glusterd architecture question that I 
posted about a month ago;  so I don't know if glusterd
is currently limiting deployment options because;
  - there is an intention to drive the heavy lifting to the
    client (for example for performance reasons in big
    deployments), or;
  - there are known limitations in the existing bricks/
    modules (for example moving files thru distribute), 
    or;
  - there is ultimately (long term) more flexibility seen
    in this model (and we're at a midway point between 
    pre glusterd and post so it doesn't feel that way 
    yet), or;
  - there is an intent to drive out a particular market 
    outcome or match an existing storage model (the 
    gluster presentation was driving towards cloud,
    and maybe those vendors don't use server side
    implementations), etc.

As I don't have a clear/big picture in my mind; if I'm 
not considering all of the impacts, then my apologies.

> > RE ZK, I have an issue with it not being a binary at
> > the linux distribution level.  This is the reason I don't
> > currently have Gluster's geo replication module in
> > place ..
> 
> What exactly is your objection to interpreted or JIT
compiled languages?
> Performance?  Security?  It's an unusual position, to say
the least.
> 

Specifically, primarily, space.  Saturn builds GlusterFS
capacity from a 48 Megabyte Linux distribution and 
adding many Megabytes of Perl and/or Python and/or 
PHP and/or Java for a single script is impractical.

My secondary concern is licensing (specifically in the 
Java run-time environment case).  Hadoop forced my
hand; GNU's JRE/compiler wasn't up to the task of 
running Hadoop when I last looked at it (about 2 or 3 
years ago now) - well, it could run a 2007 or so 
version but not current ones at that time - so now I 
work with Gluster ..

Going back to ZkFarmer;

Considering other architectures; it depends on how 
you slice and dice the problem as to how much 
external support you need;
  > I've long felt that our ways of dealing with cluster 
  > membership and staging of config changes is not 
  > quite as robust and scalable as we might want.

By way of example;
  The openMosix kernel extensions maintained their
own information exchange between cluster nodes; if
a node (ip) was added via the /proc interface, it was
"in" the cluster.  Therefore cluster membership was
the hand-off/interface.  
  It could be as simple as a text list on each node, or 
it could be left to a user space daemon which could 
then gate cluster membership - this suited everyone
with a small cluster.
  The native daemon (omdiscd) used multicast 
packets to find nodes and then stuff those IP's into
the /proc interface - this suited everyone with a 
private/dedicated cluster.
  A colleague and I wrote a TCP variation to allow 
multi-site discovery with SSH public key exchanges 
and IPSEC tunnel establishment as part of the 
gating process - this suited those with a distributed/
part-time cluster.  To ZooKeeper's point 
(http://zookeeper.apache.org/), the discovery 
protocol that we created was weak and I've since 
found a model/algorithm that allows for far more 
robust discovery.

  The point being that, depending on the final cluster
architecture for gluster (i.e. all are nodes are peers
and thus all are cluster members, nodes are client 
or server and both are cluster members, nodes are 
client or server and only clients [or servers] are 
cluster members, etc) there may be simpler cluster 
management options .. 

Cheers,

--
Ian Latter
Late night coder ..
http://midnightcode.org/