[Gluster-devel] Notes on "brick multiplexing" in 4.0

Wed Jun 17 12:14:42 UTC 2015

> One more question. I keep hearing about QoS for volumes as a feature.
> How will we guarantee service quality for all the bricks from a single
> server? Even if we weren't doing QoS, we make sure that operations on
> brick doesn't DOS the others. We already keep hearing from users about
> self-healing causing problems for the clients. Self-healing, rebalance
> running simultaneously on multiple volumes in a multiplexed bricks
> environment would most likely be disastrous.

Self-heal or rebalance running simultaneously on multiple volumes in a
*non*-multiplexed environment can be disastrous too.  It's the same I/O
load on the same machines, plus more context switches.  We could try to
use cgroups, if we don't mind giving up portability, but that would only
give us a very coarse level of control.  Within one process, we can
re-evaluate request and I/O queue lengths, thread states, memory usage,
and so on for every single request.  That finer level of control avoids
the risk of sacrificing utilization to ensure QoS.  If anything, the QoS
argument leads toward having just one glusterfsd process per server.  It
doesn't entirely solve the problem, since we still need to deal with
issues across servers, but it helps.