[Gluster-users] AFR arbiter volumes

Wed Sep 9 15:57:44 UTC 2015

Once the volume is created as an Arbiter volume can it at a later time be
changed to a replica 3 with all bricks containing data?

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

On Tue, Sep 8, 2015 at 8:46 PM, Ravishankar N <ravishankar at redhat.com>
wrote:

> Sending out this mail for awareness/ feedback.
>
> -----------------------------------------------------------------------------
> *What:*
> Since glusterfs-3.7,  AFR supports creation of arbiter volumes. These are
> a special type of replica 3 gluster volume where the 3rd brick  is (always)
> configured as an arbiter node.What this means is that the 3rd brick will
> store only the file name and metadata (including gluster xattrs), but does
> not contain any data. Arbiter volumes prevent split-brains and consumes
> lesser space than a normal replica 3 volume and provides better consistency
> and availability than a replica 2 volume.
>
> *How:*
> You can create an arbiter volume with the following command:
>
> * gluster volume create <VOLNAME> replica 3 arbiter 1 host1:brick1
> host2:brick2 host3:brick3*
>
> Note that the syntax is similar to creating a normal replica 3 volume with
> the exception of the *arbiter 1* keyword. As seen in the command above,
> the only permissible values for the replica count and arbiter count are 3
> and 1 respectively. Also, the 3rd brick is always chosen as the arbiter
> brick and it is currently not configurable to have any other brick as the
> arbiter.
>
> *Client/ Mount behaviour:*
> By default, client quorum (cluster.quorum-type) is set to auto for a
> replica 3 volume (including arbiter volumes) when it is created; i.e. at
> least 2 bricks need to be up to satisfy quorum and to allow writes. This
> setting is not to be changed for arbiter volumes also. Additionally, the
> arbiter volume has some additional checks to prevent files from ending up
> in split-brain:
>
>     * Clients take full file locks when writing to a file as opposed to
> range locks in a normal replica 3 volume.
>
>     * If 2 bricks are up and if one of them is the arbiter (i.e. the 3rd
> brick) and it blames the other up brick, then all FOPS will fail with
> ENOTCONN (Transport endpoint is not connected). If the arbiter doesn't
> blame the other brick, FOPS will be allowed to proceed. 'Blaming' here is
> w.r.t the values of AFR changelog extended attributes.
>
>     * If 2 bricks are up and the arbiter is down, then FOPS will be
> allowed.
>
>     * In all cases, if there is only one source before the FOP is
> initiated and if the FOP fails on that source, the application will receive
> ENOTCONN.
>
> Note: It is possible to see if a replica 3 volume has arbiter
> configuration from the mount point. If*
> $mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count*
> exists and its value is 1, then it is an arbiter volume. Also the client
> volume graph will have arbiter-count as a xlator option for AFR translators.
>
> *Self-heal daemon behaviour:*
>
> Since the arbiter brick does not store any data for the files, it cannot
> be used as a source for data self-heal. For example if there are 2 source
> bricks B2 and B3 (B3 being arbiter brick) and B2 is down, then
> data-self-heal will not happen from B3 to sink brick B1, and will be
> pending until B2 comes up and heal can happen from it. Note that metadata
> and entry self-heals can still happen from B3 if it is one of the sources.
>
>
> -----------------------------------------------------------------------------
>
> Please provide feedback if you have tried it out.
> *If you ever encounter a split-brain while using the arbiter volume, it is
> a BUG - do report!*
> We have had users asking for a way to convert existing replica 2 volumes
> to arbiter volumes- this is definitely in our to-do list, in addition to
> some performance optimizations.
>
> Thanks,
> Ravi
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150909/9a380516/attachment.html>