[Gluster-devel] Quality of Service in Glusterfs

Thu Mar 3 09:19:07 UTC 2016

Hi all,

This mail is an initiation of discussion on how to provide QoS in Glusterfs. The thoughts I've put in this mail are in the context of Glusterfs architecture till (and including) 3.x series. Discussions and suggestions for 4.0 are welcome. Please note that this is a very much Work in Progress as QoS for Storage is complex, relatively new (at least in non-proprietary world) and the requirements can vary. One of the objectives of this exercise is collection of requirements and determining the scope of QoS.

Some of my own reading [1][2] and discussions with others led to three QoS guarantees :

Note: From what I know, QoS guarantees seem to be on throughput and is measured in terms of IOPS. Pointers to any QoS implementations/solutions targeting latency are welcome.

1. Reservations: This is a guaranteed performance. However given our architecture, total reservations cannot exceed the capacity of weakest brick in the cluster. This is because the brick can become an hot-spot for the I/O and in worst-case scenario all I/O might be directed to that node. Is this acceptable? Note that cluster/distribute and cluster/shard can solve the problem of hotspots for directories and files respectively to a certain extent. Even with distribute and sharding the basic exercise of coming up with a capacity to be used for admission control of clients still not solved (In other words, what is the total reservation we can provide even with sharding and distribute). Any pointers or suggestions are much appreciated.

2. Limits: This is maximum IOPS a client can attain. Note that total limits should not be greater than the total throughput of volume. Note that by having limits we can control "noisy neighbors". There have been attempts to solve Noisy neighbor problem using throttling [3].

3. Proportional Shares: When the clients have met their reservations but not exceeded their limitations, the capacity is shared with others in proportion to their weights.

dmclock [2] seems to fit our requirements. Some of the positives I found are:

1. QoS can be implemented in a distributed system, requiring no communication between bricks/servers themselves.
2. Claims to adapt to varying capacity. One of the problems Jeff pointed out in using Token bucket algorithms was determining the number of tokens to be introduced into the system. This algorithm has no such requirement. Note that unlike cpu, throughput of storage (at least for magnetic disks) is stateful.
3. Its intuitive and simple.

Some of the questions/open-topics (from my understanding):

1. How to allocate costs of different fops? What is the standard we can use for comparing different fops like lookup, readdirp, read, write etc? This gets more complicated because of fop overloading. For eg., a lookup can fetch the entire content of the file if it is "small enough". How can we convert this information into a single number representing reservations/limits? In other words what does the term IOPS represent in a distributed file-system like Glusterfs?

2. Since IOPS seem to be dependent on workloads, what are the workloads we should be using to test our QoS guarantees?

3. Relation of QoS with Throttling. Should throttling be implemented as part of QoS? Since Throttling concerns itself with enforcing limits on resource consumption, I assume similar functionality can be achieved by setting limits for different clients (self-heal-daemon, rebalance process, clients etc).

4. Granularity of "clients". Should it be
   * a single application?
   * a single mount process?
   * a set of applications running on a mount?
   * An abstract tenant which can span multiple mount points?

  How do we pass this "client" information through a Posix file-system interface?

5. Should "security/isolation" be part of QoS guarantees? Multi-tenancy support attempts to solve the data isolation problem. QoS tries to solve the performance isolation problem. Are there any parallels between them? What is the scope of this exercise?

6. Caching layer is on clients. If we are going to implement QoS engine on bricks, how do we consolidate both? What about caching in VFS?

7. Other unknowns/requirements which I am not aware of.

Thanks to Jeff, Vijay and Steve for their inputs till now.

[1] https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/proposals/resource-qos.md
[2] https://labs.vmware.com/download/122/
[3] https://www.gluster.org/pipermail/gluster-devel/2016-January/048007.html

regards,
Raghavendra