[Gluster-users] How does replication work?
mark at mark.mielke.cc
Tue Sep 8 16:27:11 UTC 2009
On 09/08/2009 04:14 AM, Daniel Maher wrote:
> Alan Ivey wrote:
>> Like the subject implies, how does replication work exactly?
>> If a client is the only one that has the IP addresses defined for the
>> servers, does that mean that only a client writing a file ensures
>> that it goes to both servers? That would tell me that the servers
>> don't directly communicate with each other for replication.
>> If so, how does healing work? Since the client is the only
>> configuration with the multiple server IP addresses, is it the
>> client's "task" to make sure the server heals itself once it's back
>> If not, how do they servers know each other exist if not for the
>> client config file?
> You've answered your own question. :) AFAIK, in the recommended
> simple replication scenario, the client is actually responsible for
> replication, as each server is functionally independant.
> (This seems crazy to me, but yes, that's how it works.)
For Alan: Active healing should only be necessary if the system is not
working properly. Healing should only be required after a system crash
or bug, a GlusterFS server or client crash or bug, or somebody messing
around with the backing store file system underneath. For systems that
are up and running without problems, healing should be completely
For Daniel: For the seems crazy, compared to what? Every time I look at
other solutions such as Lustre and see how they rely on a single
metadata server, that itself is supposed to be highly available using
other means, I have to ask, are they really solving the highly
availability problem, or are they just narrowing the scope? If the whole
cluster of 2 to 1000 nodes is relying on a single server to being up,
this is the weakest link. Sure, having one weakest link to deal with is
easier to solve using traditional means that having 1000 weakest links,
but it seems clear that Lustre has not SOLVED the problem. They've just
reduced it to something that might be more manageable. Even the
"traditional means" of shared disk storage such as GFS and OCFS rely on
a single piece of hardware - the shared storage. As a result, they make
the shared storage really expensive - dual interfaces, dual power
supplies, dual disks, ... but it's still one piece of hardware that
everything else is reliant on.
For "shared nothing", each node really does need to be fully independent
and able to make its own decisions. I think the GlusterFS folk have the
model right in this regard.
The remaining question is whether they have the *implementation* right. :-)
Right now they seem to be in a compromised position between simplicity,
performance, and correctness. It seems it is a difficult problem to have
all three no matter which model is selected (shared disk, shared
metadata only, shared nothing). The self-healing is a good feature, but
they seem to be leaning on it to provide correctness, so that they can
provide performance with some amount of simplicity. An example here is
how directory listings come from "the first up server". In theory, we
could have correctness through self-healing if directory listing always
queried all servers. The combined directory listing would be shown, and
self healing would kick off in the back ground. But, this would cost
performance - as all servers in the cluster would be involved in
directory listing. This is just one example.
I think GlusterFS has a lot of potential to close off on holes such as
these. I don't think it would be difficult to add in things like an
automatic election model for defining which machines are considered
stable and the safest masters to use (simplest might be 'the one with
the highest glusterfsd uptime'?), and having clients choose to pull
things like directory listings only from the first stable / safest
master, and having the non-stable / non-safe machines go into automatic
full self-heal until they are back up-to-date with the master. In such a
model, I'd like to see the locks being held against the stable/safe
masters used for reads. Just throwing stuff out there...
For me, I'm looking at this as - I have a problem to solve, and very few
solutions seem to meet my requirements. GlusterFS looks very close. Do I
write my own, which would probably start out only solving my
requirements, and since my requirements will probably grow, this would
mean eventually writing something the size of GlusterFS? Or do I start
looking in to this GlusterFS thing - point out the problems, and see if
I can help?
I'm leaning towards the latter - try it out, point out the problems, see
if I can help.
As it is, I think GlusterFS is very stable with sufficient performance
for the requirements of most potential users. It's the people who are
really trying to push it to its limits that are causing the majority of
the breakage being reported here. For these people, which includes me,
I've looked around - and the solutions out there that are competitive
are either very expensive, or insufficient.
Mark Mielke<mark at mielke.cc>
More information about the Gluster-users