[Gluster-users] Sharding?

Fri Mar 10 13:20:59 UTC 2017

> On 10 Mar 2017, at 12:05, Krutika Dhananjay <kdhananj at redhat.com> wrote:
> 
> On Fri, Mar 10, 2017 at 4:09 PM, Cedric Lemarchand <yipikai7 at gmail.com <mailto:yipikai7 at gmail.com>> wrote:
> 
> > On 10 Mar 2017, at 10:33, Alessandro Briosi <ab1 at metalit.com <mailto:ab1 at metalit.com>> wrote:
> >
> > Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
> >>> I haven't done any test yet, but I was under the impression that
> >>> sharding feature isn't so stable/mature yet.
> >>> In the remote of my mind I remember reading something about a
> >>> bug/situation which caused data corruption.
> >>> Can someone confirm that sharding is stable enough to be used in
> >>> production and won't cause any data loss?
> >> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> >> later versions) it works well as long as you don't try to add new bricks
> >> to your volumes (we use it in production for HA virtual machine disks).
> >> Apparently that bug was fixed recently, so latest versions should be
> >> pretty stable yeah.
> >
> > I'm using 3.8.9, so I suppose all known bugs have been fixed there (also the one with adding briks)
> >
> > I'll then proceed with some tests before going to production.
> 
> I am still asking myself how such bug could happen on a clustered storage software, where adding bricks is a base feature for scalable solution, like Gluster. Or maybe is it that STM releases are really under tested compared to LTM ones ? Could we states that STM release are really not made for production, or at least really risky ?
> 
> Not entirely true. The same bug existed in LTM release too.
> 
> I did try reproducing the bug on my setup as soon as Lindsay, Kevin and others started reporting about it, but it was never reproducible on my setup.
> Absence of proper logging in libgfapi upon failures only made it harder to debug, even when the users successfully recreated the issue and shared
> their logs. It was only after Satheesaran recreated it successfully with FUSE mount that the real debugging could begin, when fuse-bridge translator
> logged the exact error code for failure.

Indeed an unreproducible bug is pretty hard to fix … thanks for the feed back. What would be the best way to find out critical bugs in different Gluster releases ? maybe browsing https://review.gluster.org/ or https://bugzilla.redhat.com, any advices ?

Cheers

Cédric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170310/2f7cd812/attachment.html>