[Gluster-users] Failed rebalance resulting in major problems

Thu Nov 7 21:04:42 UTC 2013

(resending because my reply only went to Lukáš)

On 11/7/2013 3:20 AM, Lukáš Bezdička wrote:
> I strongly suggest not using 3.3.1 or whole 3.3 branch. I would only 
> go for 3.4.1 on something close to production and even there I 
> wouldn't yet use rebalance/shrinking. We give gluster heavy testing 
> before it goes to production and about updating, why don't you build 
> your own packages? We are maintaining our builds for several years now 
> with our patches which gladly end up in gluster upstream sooner or later.

When I built the system, version 3.3.1 (and CentOS 6.3) was the latest 
that was available.  Before I added the new storage last week, I got 
onto the IRC channel and asked whether I should install the same version 
on the new servers, install the new version on the new servers, or 
upgrade the entire cluster before adding anything.  I got no actual 
answers to that question, and there wasn't really a lot of discussion 
that I noticed.  If someone did answer my question at that time, I 
missed it.

I decided to play it safe by installing the 3.3.1 version on the new 
servers.  It was a slightly newer revision, but I was told that there 
were only packaging differences, that the code itself was unchanged.  I 
installed CentOS 6.4, which I figured would be safe because Gluster is 
user-space and it's typically safe to upgrade RHEL/CentOS minor versions.

Before we deployed, I did do tests on my testbed where I added new 
storage bricks, did rebalances, removed bricks, etc. There were no 
problems with adding bricks or rebalancing, but I had nowhere near as 
many files or space used as we have in production.  I did encounter a 
bug with removing bricks, which I filed: 
https://bugzilla.redhat.com/show_bug.cgi?id=862347

Except for the 91 files that appear to be simply gone and unrecoverable, 
I am pretty much done dealing with the fallout ... but I still have 
nearly 9TB of data that needs to migrate before the bricks will be 
evenly filled, and I can't be sure that this won't happen when I request 
another rebalance, or next time we need to increase the volume size by 
adding bricks.  I really need an expert to evaluate our setup and make 
recommendations.

I sent a request off to Redhat Consulting for help on this, but I 
haven't heard anything back from them.

Thanks,
Shawn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131107/1b57fb1c/attachment.html>