Amar Tumballi amar at kadalu.io
Tue Jan 14 08:11:55 UTC 2020


As we are gearing up for Release-8, and its planning, I wanted to bring up
one of my favorite topics, 'Thin-Arbiter' (or Tie-Breaker/Metro-Cluster etc

We have made thin-arbiter release in v7.0 itself, which works great, when
we have just 1 cluster of gluster. I am talking about a situation which
involves multiple gluster clusters, and easier management of thin-arbiter
nodes. (Ref: https://github.com/gluster/glusterfs/issues/763)

I am working with a goal of hosting a thin-arbiter node service (free of
cost), for which any gluster deployment can connect, and save their cost of
an additional replica, which is required today to not get into split-brain
situation. Tie-breaker storage and process needs are so less that we can
easily handle all gluster deployments till date in just one machine. When I
looked at the code with this goal, I found that current implementation
doesn't support it, mainly because it uses 'volumename' in the file it
creates. This is good for 1 cluster, as we don't allow duplicate volume
names in a single cluster, or OK for multiple clusters, as long as volume
names are not colliding.

To resolve this properly we have 2 options (as per my thinking now) to make
it truly global service.

1. Add 'volume-id' option in afr volume itself, so, each instance picks the
volume-id and uses it in thin-arbiter name. A variant of this is submitted
for review - https://review.gluster.org/23723 but as it uses volume-id from
io-stats, this particular patch fails in case of brick-mux and
shd-mux scenarios.  A proper enhancement of this patch is, providing
'volume-id' option in AFR itself, so glusterd (while generating volfiles)
sends the proper vol-id to instance.

Pros: Minimal code changes to the above patch.
Cons: One more option to AFR (not exposed to users).

2. Add* cluster-id *to glusterd, and pass it to all processes. Let
replicate use this in thin-arbiter file. This too will solve the issue.

Pros: A cluster-id is good to have in any distributed system, specially
when there are deployments which will be 3 node each in different clusters.
Identifying bricks, services as part of a cluster is better.

Cons: Code changes are more, and in glusterd component.

On another note, 1 above is purely for Thin-Arbiter feature only, where as
2nd option would be useful in debugging, and other solutions which
involves multiple clusters.

Let me know what you all think about this. This is good to be discussed in
next week's meeting, and taken to completion.

