[Gluster-users] Failed rebalance resulting in major problems
jbd at podomatic.com
Wed Nov 6 20:57:17 UTC 2013
You're right-- I probably should have dialed it back a bit! It's
frustrating sometimes when I post about such a major issue and never see
In my case, I run into gfid bugs regularly, almost always in situations
where I have copied an entire directory tree into a GlusterFS mount. There
have been no connectivity issues between nodes, no node restarts, etc, for
months, but once in a while, I get a gfid mismatch and must manually
correct the situation.
I would certainly purchase GlusterFS support if I had any option other than
Red Hat-- they only support Red Hat Storage and that isn't a good fit for
my environment at this time. If GlusterFS is successful the way it could
be, there will definitely be an opportunity for a firm to support it on
FWIW, I've created a Github repo to store my scripts for navigating
GlusterFS issues. If they remain relevant and the repo gets activity, I'll
go to Gluster Forge. https://github.com/justindossey/gluster-scripts
On Wed, Nov 6, 2013 at 12:15 PM, Joe Julian <joe at julianfamily.org> wrote:
> On 11/06/2013 11:52 AM, Justin Dossey wrote:
> I had a very similar experience with a rebalance on 3.3.1, and it took
> weeks to get everything straightened out. I would be happy to share the
> scripts I wrote to correct the permissions issues if you wish, though I'm
> not sure it would be appropriate to share them directly on this list.
> Perhaps I should just create a project on Github that is devoted to
> collecting scripts people use to fix their GlusterFS environments!
> After that (awful) experience, I am loath to run further rebalances.
> I've even spent days evaluating alternatives to GlusterFS, as my
> experience with this list over the last six months indicates that support
> for community users is minimal, even in the face of major bugs such as the
> one with rebalancing and the continuing "gfid different on subvolume" bugs
> with 3.3.2.
> I'm one of oldest GlusterFS users around here and one of the biggest
> proponents and even I have been loath to rebalance until 3.4.1.
> There are no open bugs for gfid mismatches that I could find. The last
> time someone mentioned that error in IRC it was 2am, I was at a convention,
> and I told the user how to solve that problem (
> http://irclog.perlgeek.de/gluster/2013-06-14#i_7196149 ). It was caused
> by split-brain. If you have a bug, it would be more productive to file it
> rather than make negative comments about a community of people that have no
> requirement to help anybody, but do it anyway just because they're nice
> This is going to sound snarky because it's in text, but I mean this
> sincerely. If community support is not sufficient, you might consider
> purchasing support from a company that provides it professionally.
> Let me know what you think of the Github thing and I'll proceed
> Even better, put them up on http://forge.gluster.org
> On Tue, Nov 5, 2013 at 9:05 PM, Shawn Heisey <gluster at elyograg.org> wrote:
>> We recently added storage servers to our gluster install, running 3.3.1
>> on CentOS 6. It went from 40TB usable (8x2 distribute-replicate) to
>> 80TB usable (16x2). There was a little bit over 20TB used space on the
>> The add-brick went through without incident, but the rebalance failed
>> after moving 1.5TB of the approximately 10TB that needed to be moved. A
>> side issue is that it took four days for that 1.5TB to move. I'm aware
>> that gluster has overhead, and that there's only so much speed you can
>> get out of gigabit, but a 100Mb/s half-duplex link could have copied the
>> data faster if it had been a straight copy.
>> After I discovered that the rebalance had failed, I noticed that there
>> were other problems. There are a small number of completely lost files
>> (91 that I know about so far), a huge number of permission issues (over
>> 800,000 files changed to 000), and about 32000 files that are throwing
>> read errors via the fuse/nfs mount but seem to be available directly on
>> bricks. That last category of problem file has the sticky bit set, with
>> almost all of them having ---------T permissions. The good files on
>> bricks typically have the same permissions, but are readable by root. I
>> haven't worked out the scripting necessary to automate all the fixing
>> that needs to happen yet.
>> We really need to know what happened. We do plan to upgrade to 3.4.1,
>> but there were some reasons that we didn't want to upgrade before adding
>> * Upgrading will result in service interruption to our clients, which
>> mount via NFS. It would likely be just a hiccup, with quick failover,
>> but it's still a service interruption.
>> * We have a pacemaker cluster providing the shared IP address for NFS
>> mounting. It's running CentOS 6.3. A "yum upgrade" to upgrade gluster
>> will also upgrade to CentOS 6.4. The pacemaker in 6.4 is incompatible
>> with the pacemaker in 6.3, which will likely result in
>> longer-than-expected downtime for the shared IP address.
>> * We didn't want to risk potential problems with running gluster 3.3.1
>> on the existing servers and 3.4.1 on the new servers.
>> * We needed the new storage added right away, before we could schedule
>> maintenance to deal with the upgrade issues.
>> Something that would be extremely helpful would be obtaining the
>> services of an expert-level gluster consultant who can look over
>> everything we've done to see if there is anything we've done wrong and
>> how we might avoid problems in the future. I don't know how much the
>> company can authorize for this, but we obviously want it to be as cheap
>> as possible. We are in Salt Lake City, UT, USA. It would be preferable
>> to have the consultant be physically present at our location.
>> I'm working on redacting one bit of identifying info from our rebalance
>> log, then I can put it up on dropbox for everyone to examine.
>> Gluster-users mailing list
>> Gluster-users at gluster.org
> Justin Dossey
> CTO, PodOmatic
> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
> Gluster-users mailing list
> Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users