[Gluster-devel] Question about random and rr scheduler

Tue Mar 4 17:19:17 UTC 2008

Craig Tierney wrote:
> Amar S. Tumballi wrote:
>> On Mon, Mar 3, 2008 at 4:27 PM, Craig Tierney <Craig.Tierney at noaa.gov>
>> wrote:
>>
>>> I setting up Gluster (1.3.7) with two servers.  I first
>>> tried configuring the clients as round-robin (rr). When I try and write
>>> to the filesystem for the first time, all of the files go
>>> to the first brick.  Subsequent writes will alternate between
>>> the two bricks.  When I try random, the first file is always
>>> created on the first brick.  Subsequent writes always go to
>>> the first brick (never the second).
>>>
>>
>> As the name suggests 'random' scheduler just calls random() and it 
>> just does
>> % with the number of clients. Hence, not much control over it by user 
>> side
>> right now.
>>
> 
>>
>>
>>> What I want is round-robin, or a working random.  However,
>>> for round-robin to work for me, I need the chosen server to
>>> be random, not always the first one.
>>>
>>> In the long-term, it wouldn't really matter because everything
>>> would average out.  However, I am creating filesystems that
>>> will be temporary, so I need the right behavior in the short term.
>>>
>>> Should random do what I need?  Should I look in the code
>>> and see how to get the Round-Robin schedule to start with
>>> a random index?
>>>
>>
>> Just initialize index variable in rr scheduler to start with a random
>> number. should not be much of a work..
>>
>>
> 
> 
> I modified the rr scheduler to use a random index at initialization.
> I like the behavior much better.
> 

<patch deleted>


The patch I created actually wasn't working.  I didn't notice the behavior
until I tested it further.  The problem is that every client seems to
be calling time(NULL) at the same time (for 32 clients) and the random
number generator is being seeded with the same value.  This is the same
behavior I saw when trying to use the random scheduler.

Below is a new patch.  What is does is that it adds a function called
seed_random, which seeds the random number generator with data from /dev/urandom.
This makes it much more likely that all clients will be seeded with a different
value.  In the event that some distro doesn't have /dev/urandom defined, the
function will fall back to using time(NULL).  There may be a better fallback
position than this though.

I tested the patch with both the random and rr schedulers.  The first files
written by clients are distributed more evenly now.




diff -urN glusterfs-1.3.7/scheduler/rr/src/rr.c ../glusterfs-1.3.7/scheduler/rr/src/rr.c

--- glusterfs-1.3.7/scheduler/rr/src/rr.c       2007-10-05 05:57:12.000000000 +0000
+++ ../glusterfs-1.3.7/scheduler/rr/src/rr.c    2008-03-04 17:05:56.250635645 +0000
@@ -49,7 +49,11 @@
      trav_xl = trav_xl->next;
    }
    rr_buf->child_count = index;
-  rr_buf->sched_index = 0;
+
+
+  seed_random(); /* Replacement random number generator seed to use /dev/random */
+  rr_buf->sched_index = random()%index; /* Randomize the initial index */
+
    rr_buf->array = calloc (index + 1, sizeof (struct rr_sched_struct));
    trav_xl = xl->children;
    index = 0;

diff -urN glusterfs-1.3.7/scheduler/random/src/random.c ../glusterfs-1.3.7/scheduler/random/src/random.c
--- glusterfs-1.3.7/scheduler/random/src/random.c       2007-10-05 05:57:12.000000000 +0000
+++ ../glusterfs-1.3.7/scheduler/random/src/random.c    2008-03-04 17:06:19.815473947 +0000
@@ -29,7 +29,7 @@
    int32_t index = 0;

    /* Set the seed for the 'random' function */
-  srandom ((uint32_t) time (NULL));
+  seed_random();

    data_t *limit = dict_get (xl->options, "random.limits.min-free-disk");
    if (limit) {

--- glusterfs-1.3.7/libglusterfs/src/common-utils.c     2007-08-27 11:28:30.000000000 +0000
+++ ../glusterfs-1.3.7/libglusterfs/src/common-utils.c  2008-03-04 16:57:44.349142971 +0000
@@ -32,6 +32,7 @@
  #include <netinet/in.h>
  #include <arpa/inet.h>
  #include <signal.h>
+#include <time.h>

  #include "logging.h"
  #include "common-utils.h"
@@ -272,3 +273,31 @@
  {

  }
+
+
+
+/* Use the random number generator, /dev/urandom, if present */
+
+void seed_random() {
+
+        FILE *fp;
+        int val;
+
+        fp=fopen("/dev/urandom","r");
+        if (!fp) {
+                gf_log ("rr", GF_LOG_CRITICAL,
+                        "seed_random is unable to open /dev/random, defaulting to time");
+                srandom(time(NULL));
+                return;
+        }
+        /* This should read in a 4 byte integer) */
+        fread(&val,sizeof(val),1,fp);
+        gf_log ("rr", GF_LOG_CRITICAL,
+                "Seeding seed_random with %d",val);
+        fclose(fp);
+        srandom(val);
+
+        return;
+}--- glusterfs-1.3.7/libglusterfs/src/common-utils.h     2007-08-02 20:05:10.000000000 +0000
+++ ../glusterfs-1.3.7/libglusterfs/src/common-utils.h  2008-03-04 16:45:44.867605453 +0000
@@ -59,6 +59,7 @@

  #define VECTORSIZE(count) (count * (sizeof (struct iovec)))

+
  #define LOCK_INIT(x)    pthread_spin_init (x, 0)
  #define LOCK(x)         pthread_spin_lock (x)
  #define UNLOCK(x)       pthread_spin_unlock (x)
@@ -170,5 +171,8 @@
    return newptr;
  }

+void seed_random();
+
  #endif /* _COMMON_UTILS_H */

+

+
+













-- 
Craig Tierney (craig.tierney at noaa.gov)