[Gluster-devel] Monitoring and acting on LVM thin-pool consumption

Niels de Vos ndevos at redhat.com
Tue Apr 10 09:38:31 UTC 2018


Recently I have been implementing "volume clone" support in Heketi. This
uses the snapshot+clone functionality from Gluster. In order to create
snapshots and clone them, it is required to use LVM thin-pools on the
bricks. This is where my current problem originates....

When there are cloned volumes, the bricks of these volumes use the same
thin-pool as the original bricks. This makes sense, and allows cloning
to be really fast! There is no need to copy data from one brick to a new
one, the thin-pool provides copy-on-write semantics.

Unfortunately it can be rather difficult to estimate how large the
thin-pool should be when the initial Gluster Volume is created.
Over-allocation is likely needed, but by how much? It may not be clear
how many clones there will be made, nor how much % of data will change
on each of the clones.

A wrong estimate can easily cause the thin-pool to become full. When
that happens, the filesystem on the bricks will go readonly. Mounting
the filesystem read-writable may not be possible at all. I've even seen
/dev entries for the LV getting removed. This makes for a horrible
Gluster experience, and it can be tricky to recover from it.

In order to make thin-provisioning more stable in Gluster, I would like
to see integrated monitoring of (thin) LVs and some form of acting on
crucial events. One idea would be to make the Gluster Volume read-only
when it detects that a brick is almost out-of-space. This is close to
what local filesystems do when their block-device is having issues.

The 'dmeventd' process already monitors LVM, and by default writes to
'dmesg'. Checking dmesg for warnings is not really a nice solution, so
maybe we should write a plugin for dmeventd. Possibly something exists
already what we can use, or take inspiration from.

Please provide ideas, thoughts and any other comments. Thanks!
Niels


More information about the Gluster-devel mailing list