[Gluster-devel] An update on GlusterD-2.0

Kaushal M kshlmster at gmail.com
Wed Jun 17 15:05:14 UTC 2015

At the Gluster Summit, everyone agreed that GlusterD should be the
first component to be targeted for GlusterFS-4.0. We had good
discussions on what GlusterD lacks currently and what is required for
GlusterD-2.0. KP and I had promised to send an update to mailing list
summarizing the discussions, and this is it.

Along with the summary, we'll also be discussing our plans for Manila
integration and how we are planning to do it with GlusterD-2.0.

## Gluster Summit Summary
In the summit, KP and I presented a talk titled *GLUSTERD NEXT GEN
SCALABILITY*. The slides can be viewed at [1][1]. There is no video
recording of it unfortunately.

The summary of the presentation is below.

### Problems
GlusterD, as it is currently, is not scalable which will prevent
GlusterFS as whole from scaling. The scaling issues can be classified
- Node scalability
- Maintenance scalability
- Integration scalability
- Contribution scalability

#### Node scalability
This is the general scalability we all think about, scale in terms of
node/machine/clients. GlusterD scalability is help back in this
because of the store, transaction and synchronization mechanisms used
in GlusterD.

#### Maintenance scalability
Maintenance scalability is to with the problems we as GlusterD
maintainers faced. This is mainly related to the huge, monolithic code
base of the current GlusterD, which makes splitting of maintenance and
ownership tasks hard.

#### Integration scalability
Integration scalability can split into internal and external integration.
Internal integration is the integration dealing with new features
being added to GlusterFS. Every new feature being added needs to touch
GlusterD or CLI in some way. This has generally been done with
copy/paste coding which has added to the maintenence overhead.
External integration is the integration of Gluster with other projects
or other projects with Gluster. This is hurt due to the
non-availability of a proper API for GlusterD operations. All
interaction with GlusterD currently happens only via the CLI. And the
output we provide is generally inconsistent to be programatically

#### Contribution scalability
Getting new contributors for GlusterD is hard. New contributors are
put off because GlusterD is hard to understand, and because there
isn't enough documentation.

So GlusterD-2.0 will need to
- be scalable to 1000s of nodes
- have lower maintenance costs
- enable better external and internal integrations
- make it easier for newer contributors

### Design characteristics for GlusterD-2.0
For GlusterD-2.0 to satisfy the above listed requirements and solve
the problems listed before, it should have the following

#### Centralized store
This should help with our numbers scalability issues. GlusterD-2.0
will be built around a centralized store. This means, instead of the
GlusterD volume and peer information being persisted on all nodes, it
will be stored only on a subset of the nodes in a trusted storage

We are looking at solutions like etcd and consul, both of which
provide a distributed key/value store (and some more useful features
on top), to provide the centralized store. The transaction mechanism
for Gluster operations will be built around this centralized store.

Moving to an external store provider and a transaction framework built
around it will reduce a lot of the complexity in GlusterD.

#### Plugins
This mainly for the maintainability and internal integration aspects.
GlusterD-2.0 will have a plug-gable, modular design. We expect all the
commands of GlusterD to be implemented as plugins. Certain other parts
of GlusterD, including things like volgen, volume-set, rest api, etc.
This will allow new features to be integrated in to GlusterD easily.
The code for these plugins is expected to live with the feature, and
not in GlusterD.

Doing a plugin design requires the defining of well defined plugin
interfaces to not just plug into GlusterD-2.0, but also interact with
GlusterD well. I'll be posting a document on this to the mailing list

The GlusterD maintainers/team will be doing the implementation of a
few core commands required including volume create, start, stop,
delete, expand, shrink and set/reset options.

#### Rest API
To provide better a better method for external projects to hook on to.
GlusterD-2.0 will provide a HTTP REST API, with JSON output and proper
API versioning. We will be looking for inspiration from other storage
solutions to define the APIs presented to the users and admins.

#### Logging
To help with debuggability, GlusterD-2.0 will provide better logs that
should allow easier tracking of operations across machines. We are
currently looking at the oVirt logging mechanism, which tags all logs
of an operation with a unique id, for inspiration.

#### Documentation
We hope to do a much better job of documenting the technical details
of GlusterD-2.0, starting with design documents. We still don't have

## Manila and GlusterD-2.0

Manila is the File System as a Service component of OpenStack. Manila
has two GlusterFS providers neither of which currently support the
full Manila API. One uses a sub-directory over NFS approach and the
other uses a volume based approach. Both of them require specific
features from GlusterFS to be completely functional by the next
OpenStack release, which is expected for October 2015. Csaba listed
these requirements at [2][2]. This lists two possible approaches,
1. Implement directory level operations (required by the sub-directory approach)
2. Implement intelligent volume creation (IVC) (required for the
volume based approach)
(Please read the linked mail-thread for more context)

Approach 1 requires direct modification of the GlusterFS codebase and
is relatively more complex as it requires more features to be
implemented. Approach 2 can be implemented externally to the current
codebase, as a layer above the current GlusterD, and we expect the
effort required to be relatively lesser.

We've decided to go with the approach 2, because we believe it is
achievable in the given time. The requirements we listed for
intelligent volume creation gave rise to a design which aligns with
the GlusterD-2.0 design. The requirements we listed are as follows.

The IVC service should,

1. be able to create a GlusterFS volume without the layout being
provided by the user.
2. be able to automatically provision bricks (ie. create LVs, format, mount)
3. provide a REST endpoint for the Manila driver to use
4. provide a method for an administrator to list the storage space
available on the cluster.
5. should be accessible on any of the nodes in the trusted storage
pool. (This is one additional requirement we added that helps align
the design to GlusterD-2.0)

Given the above requirements, the IVC service would have the following
design characteristics.

1. The IVC service will have a daemon running on each node in the storage pool.
2. The IVC service will require a shared storage mechanism. This is
required to make the storage availability information available to all
nodes in the pool. This shared storage is similar to the centralized
store feature of GlusterD-2.0.
3. The IVC service will require an orchestration mechanism to run
provisioning steps on different nodes. This would require the presence
of a transaction framework, which is also required by GlusterD-2.0.
4. The IVC service will provide a REST endpoint for the Manila driver.
GlusterD-2.0 will also be providing REST access points.

Also, there have been discussions around getting an intelligent volume
creation method for GlusterFS-4.0, which would any way have to be
implemented in GlusterD-2.0.

Given the above, we can use this as a starting off point for
GlusterD-2.0, instead of having a separate IVC service.

We'll keep updating the mailing lists and the community with our
progress as we go along.


PS: I know Luis has announced Heketi, which is the IVC service. We'll
be working with him so that we make effective use of all our efforts.

[1]: http://redhat.slides.com/kmadappa/glusterd-next-generation-gluster-summit-2015/
[2]: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/11278

More information about the Gluster-devel mailing list