[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

Gandalf Corvotempesta gandalf.corvotempesta at gmail.com
Mon Nov 14 11:08:46 UTC 2016

2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
> To make gluster stable for VM images we had to add all these new features
> and then fix all the bugs Lindsay/Kevin reported. We just fixed a corruption
> issue that can happen with replace-brick which will be available in 3.9.0
> and 3.8.6. The only 2 other known issues that can lead to corruptions are
> add-brick and the bug you filed Gandalf. Krutika just 5 minutes back saw
> something that could possibly lead to the corruption for the add-brick bug.
> Is that really the Root cause? We are not sure yet, we need more time.
> Without Lindsay/Kevin/David Gossage's support this workload would have been
> in much worse condition. These bugs are not easy to re-create thus not easy
> to fix. At least that has been Krutika's experience.

Ok, but this changes should be placed in a "test" version and not
marked as stable.
I don't see any development release, only stable releases here.
Do you want all features ? Try the "beta/rc/unstable/alpha/dev" version.
Do you want the stable version without known bugs but slow on VMs
workload? Use the "-stable" version.

If you relase as stable, users tend to upgrade their cluster and use
the newer feature (that you are marking as stable).
What If I upgrade a production cluster to a stable version and try to
add-brick that lead to data corruption ?
I have to restore terabytes worth of data? Gluster is made for
scale-out, what I my cluster was made with 500TB of VMs ?
Try to restore 500TB from a backup....................

This is unacceptable. add-brick/replace-brick should be common "daily"
operations. You should heavy check these for regression or bug.

> One more take away is to get the
> documentation right. Lack of documentation led Alex to try the worst
> possible combo for storing VMs on gluster. So we as community failed in some
> way there as well.
>       Krutika will be sending out VM usecase related documentation after
> 28th of this month. If you have any other feedback, do let us know.

Yes, lack of updated docs or a reference architecture is a big issue.

