[Gluster-users] Failed rebalance resulting in major problems

Mon Nov 11 19:15:43 UTC 2013

On 11/6/2013 1:15 PM, Joe Julian wrote:
> I'm one of oldest GlusterFS users around here and one of the biggest 
> proponents and even I have been loath to rebalance until 3.4.1.

I wish that you'd said this when I was in the IRC channel asking for 
opinions about whether to upgrade before adding storage.  If you did, 
then it was when I wasn't at my computer, and it somehow got lost in the 
scrollback without nick notification highlighting.

If anyone at all had mentioned that there were known rebalance problems 
with 3.3.1, I'd have paid attention to that ... but it would have been 
especially potent coming from you.

If I upgrade, can I expect to complete this rebalance, which still needs 
to move about 9 TB of data?  Am I likely to run into problems with the 
rolling no-downtime upgrade process?  We are in the process of making 
backups of the brick filesystems onto external USB drives, just in case.

Is this possibly a result of my split-network architecture?  I have a 
total of six gluster peers. The four servers with bricks have two 
networks, both gigabit - a back-end network where they can talk to each 
other, and a network (with a default gateway) where they can talk to the 
other two peers.  Name resolution for gluster on those machines is done 
via hosts files that override DNS.  The hosts files use the back-end 
network, DNS uses the other network.

The other two peers have no bricks, but act as NFS/CIFS entry points 
from the rest of the network - network access servers. Their name 
resolution is all DNS.  Those NAS servers also have a number of other 
network cards in them so that various networks can reach the storage 
without traversing our central firewall and overloading it.

Thanks,
Shawn