[Gluster-devel] Improving thread synchronization in GlusterD

Tue Dec 23 04:31:32 UTC 2014

As a part of our ongoing effort to improve the reliability and robustness
of GlusterD we are also targeting concurrency related issues, and this
proposal is with regard to the mentioned issue.

The Big-lock
------------

GlusterD was originally designed as a single threaded application which
could handle just one transaction at a time. It was made multi-threaded to
improve responsiveness and support handling multiple transactions at a
time. This was needed for newer features like volume snapshots which could
leave GlusterD unresponsive for some periods of time.

Making GlusterD multi-threaded required the creation of a thread
synchronization mechanism, to protect the shared data-structures (mainly
everything under the GlusterD configuration, glusterd_conf_t struct) from
concurrent access from multiple threads. This was accomplished using the
Big-lock.

The Big-lock is an exclusive lock, so any threads which needs to use the
protected data need to obtain the Big-lock and give up the Big-lock once
done.

Problem with Big-lock
---------------------

The Big-lock synchronization solution was added into the GlusterD code to
solve problems that arose when GlusterD was made multi-threaded. This was
supposed to be a quick solution, to allow GlusterD to be shipped.

Big-lock as the name suggests, is a coarse grained lock. The coarseness of
the lock leads to threads contending even when they are accessing unrelated
data, which lead to some deadlocks.

One example of this deadlock is with transactions and RPC. If a thread
holding the Big-lock blocked on network I/O it may result in a deadlock.
This could happen when the remote endpoint is disconnected. The callback
code would be executed in the same thread that has acquired the Big-lock.
All network I/O handlers, including callbacks, are implemented to acquire
the Big-lock before executing. From the above two, we have a deadlock.

To avoid this, we release the Big-lock whenever a thread could block on
network I/O. This comes with a price. This opens up a window of time when
the shared data structures are prone to updates leading to inconsistencies.

The Big-lock, in its current state, doesn’t even fully satisfy the problem
it set out to solve, and has more problems on top of that. These problems
are only going to grow with new features and new code being added to
GlusterD.

Possible solutions
------------------

The most obvious solution would be to split up the Big-lock into more fine
grained locks. We could go one step further and use replace the mutex locks
(Big-lock is a mutex lock), with readers-writer locks. This will bring in
more flexibility and fine grained control, at the cost of additional
overheads mainly in the complexity of implementation.

As an alternative to readers-writer locks, we propose to use RCU as the
synchronization mechanism. RCU provides several advantages above
readers-writer locks while providing similar synchronization features.
These advantages make it more preferable to readers-writer locks, even
though the implementation complexity remains nearly the same for both
approaches.

RCU
---

RCU, short for Read-Copy-Update, is a synchronization mechanism that can be
used as an alternative to reader-writer locks.

A good introduction to RCU can be found in this series of articles on LWN
[1][1] and [2][2]. The articles are with respect to the usage of RCU in the
Linux kernel, where it is used heavily.

The advantages that make RCU preferable to RWlocks are the following,

- Wait free reads
  RCU readers have no wait overhead. They can never be blocked by writers.
RCU readers need to notify when they are in their critical sections, but
this notification is much lighter than locks.

- Provides existence guarantees
  RCU guarantees that RCU protected data in a readers critical section will
remain in existence till the end of the critical section. This is achieved
by having the writers work on a copy of the data, instead of using the
existing data.

- Concurrent readers and writers
  Wait-free reads and the existence guarantee mean that it is possible to
have readers and writers in concurrent execution. Any readers in execution,
before a writer starts will continue working with the original copy of the
data. The writer will work on a copy, and will use RCU methods to
swap/replace original data without affecting existing readers. Any readers
coming online after the writer will see the new data.
  This does mean that some readers will continue to work with stale data,
but this isn't too big a problem as the data at least remains consistent
till the reader finishes.

- Read-side deadlock immunity
  RCU readers always run in a deterministic time as they never block. This
means that they can never become a part of a deadlock.

- No writer starvation
  As RCU readers don't block, writers can never starve.

### Userspace RCU

The kernel uses features provided by the processor to implement its RCU.
Userspace applications cannot make use of these features, but instead can
use the Userspace RCU library.

liburcu [3][3] provides a userspace implementation of RCU, which is
portable across multiple platforms and operating systems. liburcu also
provides some common data structures and RCU protected APIs to use them.

An introduction to URCU and its APIs can be found in this article on LWN
[4][4].

Proposed implementation
-----------------------

> NOTE: This is still a high level concept. We haven't yet gotten into the
details of the implementation.

The Big-lock is currently used mainly to protect access to the various
configuration lists in GlusterD, including peers list, volumes list, bricks
lists and snapshots list. These lists currently use the list API provided
by libglusterfs.

For the initial implementation we will be replacing these lists with the
RCU protected list data structures and APIs provided by liburcu. If
implemented correctly, this should in itself solve a majority of the
problems we have.  After this first change, we'll continue on to protect
other data structures in GlusterD with RCU.

If everything goes well, we hope to make RCU potentially a part of the
GlusterFS library and use it elsewhere in our codebase.

We are prototyping the implementation using the bullet-proof flavour of
liburcu [5][5]. We'll share our findings shortly.

### Open issues

1. Availability of liburcu on different distributions and flavours of Unix.
2. Choice of liburcu flavour for the main implementation.

[1]: https://lwn.net/Articles/262464/ "What is RCU, fundamentally?"
[2]: https://lwn.net/Articles/263130/ "What is RCU? Part 2: Usage"
[3]: http://urcu.so/ "Userspace RCU"
[4]: https://lwn.net/Articles/573424/ "User-space RCU"
[5]: https://lwn.net/Articles/573424/#URCU%20Flavors "URCU flavours"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20141223/a7eca46e/attachment-0001.html>