[Gluster-devel] Architecture advice
Gordan Bobic
gordan at bobich.net
Mon Jan 12 18:30:49 UTC 2009
Martin Fick wrote:
>> Not on the client, anyway. But if you're AFR-ing on
>> server side, then your client always talks to one server
>> anyway. The traditional way to handle server failure in that
>> case is to set up Heartbeat or RHCS to fail over the IP
>> address resource to the surviving server.
>>
>> The TCP connection will reset when the fail-over occurs -
>> I'm not sure how gracefully/transparently GlusterFS
>> reconnects.
> ...
>
> 1.4 supports an new HA translator that is meant for clients to contact servers that AFR each other. Like this:
>
>
> Client
> |
> HA
> / \
> / \
> / \
> Server A Server B
> | |
> AFR AFR
> | \ / |
> | \ / |
> | \ / |
> | X |
> | / \ |
> | / \ |
> Vol A Vol B
>
>
>> I wasn't aware of there being a HA translator built
>> into GlusterFS, but unless you have proper fencing in place,
>> failing over IP addresses won't work. Without proper
>> cluster fencing in place you can easily find yourself in a
>> split-brain situation where both servers think they have the
>> same IP address and neither can talk to any of the clients.
>>
> ...
> No need for fencing simply because you now use HA translator.
> The assumption in this case is that the servers can still talk
> to each other but that one server's connection to the clients
> may have died.
That means that 50% of the scope for failure will still wipe you out
because you'll start splitbraining. Not the way forward at all. A
fencing setup will at least preserve the data integrity. The correct way
to handle comms channel failure between client and server is to have
bonded interfaces going via different physical paths. _ONLY_ dealing
with the situation where both servers are alive and connected to each
other but we can only reach one due to an obscure failure somewhere in
the network (e.g. a failed switch port or a failed NIC in the server) is
a pretty half-arsed edge case.
Why re-invent the wheel when the tools to deal with these failure modes
already exist?
> Any failures on the server side may still warrant a fencing setup,
> but AFR is not yet setup to work cooperatively with a fencing setup.
It doesn't have to be. If one server in AFR dies nothing spectacular
happens. Things time out and carry on. I don't see what cooperation
there would need to be. RHCS does it's own heart-beating and fencing.
Mix and match as required.
Gordan
More information about the Gluster-devel
mailing list