[Gluster-users] Is rebalance in progress or not?
hunter86_bg at yahoo.com
Sun Mar 15 10:07:41 UTC 2020
On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev <ailiev+gluster at mamul.org> wrote:
>I was having some issues with one of my Gluster nodes so I ended up
>re-installing it. Now I want to re-add the bricks for my main volume
>I'm having the following issue - when I try to add the bricks I get:
> > # gluster volume add-brick store1 replica 3 <bricks ...>
> > volume add-brick: failed: Pre Validation failed on 172.31.35.132.
>Volume name store1 rebalance is in progress. Please retry after
>But then if I get the rebalance status I get:
> > # gluster volume rebalance store1 status
> > volume rebalance: store1: failed: Rebalance not started for volume
>And if I try to start the rebalancing I get:
> > # gluster volume rebalance store1 start
>> volume rebalance: store1: failed: Rebalance on store1 is already
>Looking at the logs of the first node, when I try to start the
>operation I see this:
> > [2020-03-15 09:41:31.883651] E [MSGID: 106276]
>stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67
>On the second node the logs are showing stuff that indicates that a
>rebalance operation is indeed in progress:
> > [2020-03-15 09:47:34.190042] I [MSGID: 109081]
>[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
> > [2020-03-15 09:47:34.775691] I
>[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data
>called on /redacted
> > [2020-03-15 09:47:36.019403] I
>[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
>operation on dir /redacted took 1.24 secs
>Some background on what led to this situation:
>The volume was originally a replica 3 distributed replicated volume on
>three nodes. In order to detach the faulty node I lowered the replica
>count to 2 and removed the bricks from that node from the volume. I
>cleaned up the storage (formatted the bricks and cleaned the
>trusted.gfid and trusted.glusterfs.volume-id extended attributes) and
>purged the gluster packages from the system, then I re-installed the
>gluster packages and did a `gluster peer probe` from another node.
>I'm running Gluster 6.6 on CentOS 7.7 on all nodes.
>I feel stuck at this point, so any guidance will be greatly
Did you try to go the second node (the one tgat thinks balance is running) and stop tge balance ?
gluster volume rebalance VOLNAME stop
Then add the new brick (and increase the replica count) and after the heal is over - rebalance again.
More information about the Gluster-users