[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

Mon Nov 14 16:01:35 UTC 2016

On Mon, Nov 14, 2016 at 8:54 AM, Niels de Vos <ndevos at redhat.com> wrote:

> On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri wrote:
> > On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta <
> > gandalf.corvotempesta at gmail.com> wrote:
> >
> > > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <
> pkarampu at redhat.com>:
> > > > To make gluster stable for VM images we had to add all these new
> features
> > > > and then fix all the bugs Lindsay/Kevin reported. We just fixed a
> > > corruption
> > > > issue that can happen with replace-brick which will be available in
> 3.9.0
> > > > and 3.8.6. The only 2 other known issues that can lead to
> corruptions are
> > > > add-brick and the bug you filed Gandalf. Krutika just 5 minutes back
> saw
> > > > something that could possibly lead to the corruption for the
> add-brick
> > > bug.
> > > > Is that really the Root cause? We are not sure yet, we need more
> time.
> > > > Without Lindsay/Kevin/David Gossage's support this workload would
> have
> > > been
> > > > in much worse condition. These bugs are not easy to re-create thus
> not
> > > easy
> > > > to fix. At least that has been Krutika's experience.
> > >
> > > Ok, but this changes should be placed in a "test" version and not
> > > marked as stable.
> > > I don't see any development release, only stable releases here.
> > > Do you want all features ? Try the "beta/rc/unstable/alpha/dev"
> version.
> > > Do you want the stable version without known bugs but slow on VMs
> > > workload? Use the "-stable" version.
> > >
> > > If you relase as stable, users tend to upgrade their cluster and use
> > > the newer feature (that you are marking as stable).
> > > What If I upgrade a production cluster to a stable version and try to
> > > add-brick that lead to data corruption ?
> > > I have to restore terabytes worth of data? Gluster is made for
> > > scale-out, what I my cluster was made with 500TB of VMs ?
> > > Try to restore 500TB from a backup....................
> > >
> > > This is unacceptable. add-brick/replace-brick should be common "daily"
> > > operations. You should heavy check these for regression or bug.
> > >
> >
> > This is a very good point. Adding other maintainers.
>
> Obviously this is unacceptible for versions that have sharding as a
> functional (not experimental) feature. All supported features are
> expected to function without major problems (like corruption) for all
> standard Gluster operations. Add-brick/replace-brick are surely such
> Gluster operations.
>
> Of course it is possible that this does not always happen, and our tests
> did not catch the problem. In that case, we really need to have a bug
> report with all the details, and preferably a script that can be used to
> reproduce and detect the failure.
>

I believe this bug relates to this particular issue raised in this email
chain.

https://bugzilla.redhat.com/show_bug.cgi?id=1387878

Kevin found bug, and Lindsay filed report after she was able to recreate it.

>
> FWIW sharding has several open bugs (like any other component), but it
> is not immediately clear to me if the problem reported in this email is
> in Bugzilla yet. These are the bugs that are expected to get fixed in
> upcoming minor releases:
>   https://bugzilla.redhat.com/buglist.cgi?component=
> sharding&f1=bug_status&f2=version&o1=notequals&o2=
> notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline
>
> HTH,
> Niels
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161114/25535165/attachment.html>