[Gluster-users] Per-client prefered server?

Yannick Perret yannick.perret at liris.cnrs.fr
Fri Mar 4 10:05:57 UTC 2016


Le 04/03/2016 10:56, Krutika Dhananjay a écrit :
> Hi,
>
> So up until in 3.5.x, there was a read child selection mode called 
> 'first responder' where the brick that responds first for a particular 
> client becomes the read child.
> After the replication module was rewritten for the most part from 
> 3.6.0, this mode was removed.
>
> There exists a workaround, though. Could you share the output of 
> `gluster volume info <VOL>`?
>
It gives (note: this volume is not used apart for testing so I can 
"play" with it):
Volume Name: HOME
Type: Replicate
Volume ID: ea90bcaf-990d-436a-b4fa-8fa20d67f924
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: sto1.liris.cnrs.fr:/glusterfs/home/data
Brick2: sto2.liris.cnrs.fr:/glusterfs/home/data
Options Reconfigured:
performance.quick-read: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.entry-self-heal: on
cluster.consistent-metadata: true
auth.ssl-allow: sto1.liris.cnrs.fr,sto2.liris.cnrs.fr,connect.liris.cnrs.fr
server.ssl: off
client.ssl: off
diagnostics.latency-measurement: off
diagnostics.count-fop-hits: off


BTW a first-repsonding option can be nice but is not exaclty the same 
thing as under heavy load from one server you could fall onto the other 
one. Our purpose is really to reduce bandwidth beetween buildings when 
possible, not to use the faster server :)

--
Y.




> -Krutika
>
> ------------------------------------------------------------------------
>
>     *From: *"Yannick Perret" <yannick.perret at liris.cnrs.fr>
>     *To: *"Saravanakumar Arumugam" <sarumuga at redhat.com>,
>     gluster-users at gluster.org
>     *Sent: *Friday, March 4, 2016 2:43:16 PM
>     *Subject: *Re: [Gluster-users] Per-client prefered server?
>
>     Le 03/03/2016 15:17, Saravanakumar Arumugam a écrit :
>
>
>
>         On 03/03/2016 05:38 PM, Yannick Perret wrote:
>
>             Hello,
>
>             I can't find if it is possible to set a prefered server on
>             a per-client basis for replica volumes, so I ask the
>             question here.
>
>             The context: we have 2 storage servers, each in one
>             building. We also have several virtual machines on each
>             building, and they can migrate from one building to an
>             other (depending on load, maintenance…).
>
>             So (for testing at this time) I setup a x2 replica volume,
>             one replica on each storage server of course. As most of
>             our volumes are "many reads - few writes" it would be
>             better for bandwidth that each client uses the "nearest"
>             storage server (local building switch) - for reading, of
>             course. The 2 buildings have a good netlink but we prefer
>             to minimize - when not needed - data transferts beetween
>             them (this link is shared).
>
>             Can you see a solution for this kind of tuning? As far as
>             I understand geo-replica is not really what I need, no?
>
>
>         Yes, geo-replication "cannot" be used as you wish to carry out
>         "write" operation on Slave side.
>
>     Ok, thanks. I was pretty sure it was the case but I prefer to ask.
>
>
>             It exists "cluster.read-subvolume" option of course but we
>             can have clients on both building so a per-volume option
>             is not what we need. An per-client equivalent of this
>             option should be nice.
>
>             I tested by myself a small patch to perform this - and it
>             seems to work fine as far as I can see - but 1. before
>             continuing in this way I would first check if it exists an
>             other way and 2. I'm not familiar with the whole code so
>             I'm not sure that my tests are in the "state-of-the-art"
>             for glusterfs.
>
>         maybe you should share that interesting patch :) and get
>         better feedback about your test case.
>
>
>     My "patch" is quite simple: I added in
>     afr_read_subvol_select_by_policy()
>     (xlators/cluster/afr/src/afr-common.c) a target selection similar
>     to the one managed by "read_child" configuration (see patches at
>     the end).
>
>     Of course I also added the definition of this "forced_child" in
>     afr.h, in the same way favorite_child or read_child is defined.
>
>
>     My real problem here is how to tell to client to change its
>     "forced-child" value.
>
>     I did this by reading it from a local file
>     (/var/lib/glusterd/forced-child) in init() and reconfigure() (from
>     xlators/cluster/afr/src/afr.c). This is fine at startup, and when
>     volume configuration changed, but I find the sending a SIGUP is
>     not enough because client detects that no change occurs and do not
>     call "reconfigure()". So for my tests I modified
>     glusterfsd/src/glusterfsd-mgmt.c so that if
>     /var/lib/glusterf/forced-child exists it behave as if a
>     configuration change occured (and so calls reconfigure() which
>     reload the forced-child value).
>
>
>     At this point it works as I expected but I think it is possible to
>     handle a new forced-child value without calling all reconfigure().
>     Moreover this file contains a raw number, it would be better to
>     use a server name, which would be converted into index after that.
>
>     If possible it may be better to send the new value using 'gluster'
>     command on client? I.e. something like 'gluster volume set
>     client.prefered-server SERVER-NAME'?
>
>
>     Any advices welcome.
>
>
>     Regards,
>     --
>     Y.
>
>
>     <<<
>     --- glusterfs-3.6.7/xlators/cluster/afr/src/afr-common.c
>     2015-11-25 12:55:58.000000000 +0100
>     +++ glusterfs-3.6.7b/xlators/cluster/afr/src/afr-common.c
>     2015-12-10 14:59:18.898580772 +0100
>     @@ -764,10 +764,18 @@
>          int             i           = 0;
>          int             read_subvol = -1;
>          afr_private_t  *priv        = NULL;
>     -        afr_read_subvol_args_t local_args = {0,};
>     +    afr_read_subvol_args_t local_args = {0,};
>
>          priv = this->private;
>
>     +
>     +    /* if forced-child use it */
>     +    if ((priv->forced_child >= 0)
>     +        && (priv->forced_child < priv->child_count)
>     +        && (readable[priv->forced_child])) {
>     +        return priv->forced_child;
>     +    }
>     +
>          /* first preference - explicitly specified or local subvolume */
>          if (priv->read_child >= 0 && readable[priv->read_child])
>                      return priv->read_child;
>     >>>
>
>     <<<
>     @@ -83,6 +83,7 @@
>              unsigned int hash_mode;       /* for when read_child is
>     not set */
>              int favorite_child;  /* subvolume to be preferred in
>     resolving
>                                               split-brain cases */
>     +        int forced_child;    /* child to use (if possible) */
>
>              gf_boolean_t inodelk_trace;
>              gf_boolean_t entrylk_trace;
>     >>>
>
>     <<<
>     --- glusterfs-3.6.7/xlators/cluster/afr/src/afr.c 2015-11-25
>     12:55:58.000000000 +0100
>     +++ glusterfs-3.6.7b/xlators/cluster/afr/src/afr.c 2015-12-10
>     16:34:55.530790442 +0100
>     @@ -23,6 +23,7 @@
>
>      struct volume_options options[];
>
>     +
>      int32_t
>      notify (xlator_t *this, int32_t event,
>              void *data, ...)
>     @@ -106,9 +107,26 @@
>              int            ret         = -1;
>              int            index       = -1;
>              char          *qtype       = NULL;
>     +    FILE          *prefer      = NULL;
>     +        int            i           = -1;
>
>              priv = this->private;
>
>     +        /* if /var/lib/glusterd/forced-child exists read the content
>     +           and use it as prefered target for read */
>     +        priv->forced_child = -1;
>     +        prefer = fopen("/var/lib/glusterd/forced-child", "r");
>     +        if (prefer) {
>     +                if (fscanf(prefer, "%d", &i) == 1) {
>     +                        if ((i >= 0) && (i < priv->child_count)) {
>     +                                priv->forced_child = i;
>     +                gf_log (this->name, GF_LOG_INFO,
>     +                        "using %d as forced-child", i);
>     +                        }
>     +                }
>     +                fclose(prefer);
>     +        }
>     +
>          GF_OPTION_RECONF ("afr-dirty-xattr",
>                    priv->afr_dirty, options, str,
>                    out);
>     @@ -234,6 +252,7 @@
>              int            read_subvol_index = -1;
>              xlator_t      *fav_child   = NULL;
>              char          *qtype       = NULL;
>     +    FILE          *prefer      = NULL;
>
>              if (!this->children) {
>                      gf_log (this->name, GF_LOG_ERROR,
>     @@ -261,6 +280,21 @@
>
>              priv->read_child = -1;
>
>     +    /* if /var/lib/glusterd/forced-child exists read the content
>     +           and use it as prefered target for read */
>     +        priv->forced_child = -1;
>     +        prefer = fopen("/var/lib/glusterd/forced-child", "r");
>     +        if (prefer) {
>     +        if (fscanf(prefer, "%d", &i) == 1) {
>     +            if ((i >= 0) && (i < priv->child_count)) {
>     +                priv->forced_child = i;
>     +                gf_log (this->name, GF_LOG_INFO,
>     +                                        "using %d as
>     forced-child", i);
>     +            }
>     +        }
>     +        fclose(prefer);
>     +    }
>     +
>          GF_OPTION_INIT ("afr-dirty-xattr", priv->afr_dirty, str, out);
>
>          GF_OPTION_INIT ("metadata-splitbrain-forced-heal",
>     >>>
>
>     <<<
>     --- glusterfs-3.6.7/glusterfsd/src/glusterfsd-mgmt.c 2015-11-25
>     12:55:58.000000000 +0100
>     +++ glusterfs-3.6.7b/glusterfsd/src/glusterfsd-mgmt.c 2015-12-10
>     16:34:20.530789162 +0100
>     @@ -1502,7 +1502,9 @@
>              if (size == oldvollen && (memcmp (oldvolfile, rsp.spec,
>     size) == 0)) {
>                      gf_log (frame->this->name, GF_LOG_INFO,
>                              "No change in volfile, continuing");
>     -                goto out;
>     +        if (access("/var/lib/glusterd/forced-child", R_OK) != 0) {
>     +                    goto out; /* don't skip if exists to re-read
>     forced-child */
>     +        }
>              }
>
>              tmpfp = tmpfile ();
>     >>>
>
>     --
>     Y.
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160304/24f584c3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160304/24f584c3/attachment.p7s>


More information about the Gluster-users mailing list