[Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

Xavier Hernandez xhernandez at datalab.es
Mon Mar 14 08:31:24 UTC 2016


Hi,

On 11/03/16 13:55, Jeff Darcy wrote:
>> Raghavendra G and I discussed about this problem and the right way to
>> fix it is to take a copy(without dict_foreach) of the dictionary in
>> dict_foreach inside a lock and then loop over the local dictionary. I am
>> worried about the performance implication of this
>
> I'm worried about the correctness implications.  Any such copy will have
> to do the equivalent of dict_foreach even if it doesn't call the function
> of that name, and will be subject to the same races.
>
>> Also included Xavi, who earlier said we need to change dict.c but it is
>> a bigger change. May be the time has come? I would love to gather all
>> your inputs and implement a better version of dict if we need one.
>
> There are three solutions I can think of.
>
> (1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down
> both paths, so the two can synchronize between themselves.
>
> (2) Have tier issue the lookup down the two paths *serially* instead
> of in parallel, so there's no contention on the dictionary.  This is
> probably most complicated *and* worst for performance, but I include
> it for the sake of completeness.
>
> (3) Enhance dict_t with a gf_lock_t that can be used to serialize
> access.  We don't have to use the lock in every invocation of
> dict_foreach (though we should probably investigate that).  For
> now, we can just use it in the code paths we know are contending.
>
> I suspect Xavi was suggesting (3) and I concur that it's the best
> long-term solution.

My initial idea was quite different and probably unpractical to 
implement due to its implications in existing code. However here it is:

The idea was to have a dict based on refcounting controlled by caller 
(instead of callee like it's done now) and without locks (only atomic 
operations to increase/decrease refcounting). When some dict needs to be 
modified, it's only done if refcount is 1, otherwise a copy is made 
before changing it. This means that any dict with more that one 
reference becomes read-only avoiding any interference between xlators.

The most critical change is the caller controlled refcounting. 
Currently, when some xlator needs to use a dict, it takes a ref. This 
makes impossible to know if the already existing ref (owned by the 
caller) will be used or not, so we should always do copies of dicts, 
which is inefficient.

When it's the caller who controls the refcounting, if it still needs the 
reference to do some other work after calling a function that needs the 
dict, it will increase the refcounting. Otherwise it will simply "give" 
its reference to the called function. With this approach we'll need to 
distinguish between functions that really need their own reference form 
those that can share the reference of the caller.


Example:

   Current code:

      dict = ...;

      /* We won't use dict anymore in this function, but we still hold
         a reference on the dict */
      do_something(..., dict, ...);
      /* We release our reference */
      dict_unref(dict);

      ---------

      /* Function that needs its own reference to dict */
      do_something(..., dict_t *dict, ...) {
          dict_ref(dict);

          /* We do something in background that uses the dict. It will
             take care of releasing the ref when not needed anymore. */
          do_something_in_background(..., dict, ...);
      }

      /* Function that doesn't need its own reference to dict */
      do_something(..., dict_t *dict, ...) {
          /* We do something with the dict here, but we don't return
             until all work has been done. In this case we can share the
             ref of the caller. */
          <do something with dict>
      }


   Proposed change:

     dict = ...;

     /* We won't use dict anymore, so we pass our reference to
        do_something() */
     do_something(..., dict, ...);

     /* If do_something() shares our reference, we'll need to release it
        here.

        dict_unref(dict);

     */

     /* Here we cannot use dict because we don't have any reference to
        it */

     ---------

     /* Function that needs its own reference to dict */
     do_something(..., dict_t *dict, ...) {
         /* We already have a reference that we can use */
         ...
         /* We do something in background that uses the dict. It will
            take care of releasing the ref when not needed anymore. */
         do_something_in_background(..., dict, ...);
         /* dict cannot be used here unless a ref has been taken before
            calling do_something_in_background(). */
     }

     /* Function that doesn't need its own reference to dict */
     do_something(..., dict_t *dict, ...) {
         /* We do something with the dict here, but we don't return
            until all work has been done. In this case we can share the
            ref of the caller. */
         <do something with dict>
     }


if we need to keep a reference on the caller:

     dict = ...;

     /* Acquire a new reference for do_something() if it won't share our
        reference */
     dict_ref(dict);

     do_something(..., dict, ...);

     /* Do whatever we need with our reference on dict. Note that our
        dict will be clean, exactly as we had it before calling
        do_something(), whatever this function has done. */
     <do anything on dict>

     /* Release our ref */
     dict_unref(dict);


Xavi

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>


More information about the Gluster-devel mailing list