ian.latter at midnightcode.org
Tue May 8 23:08:32 UTC 2012
> On 05/08/2012 12:27 AM, Ian Latter wrote:
> > The equivalent configuration in a glusterd world (from
> > my experiments) pushed all of the distribute knowledge
> > out to the client and I haven't had a response as to how
> > to add a replicate on distributed volumes in this model,
> > so I've lost replicate.
> This doesn't seem to be a problem with replicate-first vs.
> but with client-side vs. server-side deployment of those
> *can* construct your own volfiles that do these things on
the servers. It will
> work, but you won't get a lot of support for it. The
issue here is that we
> have only a finite number of developers, and a
near-infinite number of
> configurations. We can't properly qualify everything.
One way we've tried to
> limit that space is by preferring distribute over
replicate, because replicate
> does a better job of shielding distribute from brick
failures than vice versa.
> Another is to deploy both on the clients, following the
scalability rule of
> pushing effort to the most numerous components. The code
can support other
> arrangements, but the people might not.
Sure, I have my own vol files that do (did) what I wanted
and I was supporting myself (and users); the question
(and the point) is what is the GlusterFS *intent*? I'll
write an rsyncd wrapper myself, to run on top of Gluster,
if the intent is not allow the configuration I'm after
(arbitrary number of disks in one multi-host environment
replicated to an arbitrary number of disks in another
multi-host environment, where ideally each environment
need not sum to the same data capacity, presented in a
single contiguous consumable storage layer to an
arbitrary number of unintelligent clients, that is as fault
tolerant as I choose it to be including the ability to add
and offline/online and remove storage as I so choose) ..
or switch out the whole solution if Gluster is heading
away from my needs. I just need to know what the
direction is .. I may even be able to help get you there if
you tell me :)
> BTW, a similar concern exists with respect to replication
(i.e. AFR) across
> data centers. Performance is going to be bad, and there's
not going to be much
> we can do about it.
Hmm .. that depends .. these sorts of statements need
context/qualification (in bandwidth and latency terms). For
example the last multi-site environment that I did
architecture for was two DCs set 32kms apart with a
redundant 20Gbps layer-2 (ethernet) stretch between
them - latency was 1ms average, 2ms max (the fiber
actually took a 70km path). Didn't run Gluster on it, but
we did stretch a number things that "couldn't" be stretched.
> > But in this world, the client must
> > know about everything and the server is simply a set
> > of served/presented disks (as volumes). In this
> > glusterd world, then, why does any server need to
> > know of any other server, if the clients are doing all of
> > the heavy lifting?
> First, because config changes have to apply across
servers. Second, because
> server machines often spin up client processes for things
like repair or
Yep, but my reading is that the config's that the servers
need are local - to make a disk a share (volume), and
that as you've described the rest are "client processes"
(even when on something built as a "server"), so if you
catered for all clients then you'd be set? I.e. AFR now
runs in the client?
And I am sick of the word-wrap on this client .. I think
you've finally convinced me to fix it ... what's normal
these days - still 80 chars?
Late night coder ..
More information about the Gluster-devel