[Gluster-users] Pausing rebalance

Tue Dec 10 06:00:21 UTC 2013

On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com> wrote:
> On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
>> Hi Franco,
>>
>>
>> If a file is under migration, and a rebalance stop is encountered,
>> then rebalance process exits only after the completion of the
>> migration.
>>
>> That might be one of the reasons why you saw rebalance in progress
>> message while trying to add the brick
>
> The status said it was stopped. I didn't do a top on the machine but are
> you saying that it was still rebalancing despite saying it had stopped?
>

The 'stopped' status is a little bit misleading. The rebalance process
could have been migrating a large file when the stop command was
issued, so the process would continue migrating that file and quit
once it finished. In this time period, though the status says
'stopped' the rebalance process is actually running, which prevents
other operations from happening. Ideally, we would have a 'stopping'
status which would convey the correct meaning. But for now we can only
verify that a rebalance process has actually stopped by monitoring the
actual rebalance process. The rebalance process is a 'glusterfs'
process with some arguments containing rebalance.

>>
>> Could you please share the average file size in your setup?
>>
>
> Bit hard to say, I just copied some data from our main processing
> system. The sizes range from very small to 10's of gigabytes.
>
>>
>> You could always check the rebalance status command to ensure
>> rebalance has indeed completed/stopped before proceeding with the
>> add-brick. Using add-brick force while rebalance is on-going should
>> not be used in normal scenarios. I do see that in your case, they show
>> stopped/completed. Glusterd logs would help in triaging the issue.
>
> See attached.
>
>>
>>
>> Rebalance re-writes layouts, and migrates data. While this is
>> happening, if a add-brick is done, then the cluster might go into a
>> imbalanced stated. Hence, the check if rebalance is in progress while
>> doing add-brick
>
> I can see that but as far as I could tell, the rebalance had stopped
> according to the status.
>
> Just to be clear, what command restarts the rebalancing?
>
>>
>>
>> With regards,
>> Shishir
>>
>>
>>
>> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:
>>
>>         Before attempting a rebalance on my existing distributed
>>         Gluster volume
>>         I thought I'd do some testing with my new storage. I created a
>>         volume
>>         consisting of 4 bricks on the same server and wrote some data
>>         to it. I
>>         then added a new brick from a another server. I ran the
>>         fix-layout and
>>         wrote some new files and could see them on the new brick. All
>>         good so
>>         far, so I started the data rebalance. After it had been
>>         running for a
>>         while I wanted to add another brick, which I obviously
>>         couldn't do while
>>         it was running so I stopped it. Even with it stopped It
>>         wouldn't let me
>>         add a brick so I tried restarting it, but it wouldn't let me
>>         do that
>>         either. I presume you just reissue the start command as
>>         there's no
>>         restart?
>>
>>         [root at nas3 ~]# gluster vol rebalance test-volume status
>>                                             Node Rebalanced-files
>>              size       scanned      failures       skipped
>>         status run time in secs
>>         ---------      -----------   -----------   -----------
>>         -----------   -----------   ------------   --------------
>>         localhost                7       611.7GB          1358
>>         0            10        stopped          4929.00
>>         localhost                7       611.7GB          1358
>>         0            10        stopped          4929.00
>>          nas4-10g                0        0Bytes          1506
>>         0             0      completed             8.00
>>         volume rebalance: test-volume: success:
>>         [root at nas3 ~]# gluster vol add-brick test-volume
>>         nas4-10g:/data14/gvol
>>         volume add-brick: failed: Volume name test-volume rebalance is
>>         in progress. Please retry after completion
>>         [root at nas3 ~]# gluster vol rebalance test-volume start
>>         volume rebalance: test-volume: failed: Rebalance on
>>         test-volume is already started
>>
>>         In the end I used the force option to make it start but was
>>         that the
>>         right thing to do?
>>
>>         glusterfs 3.4.1 built on Oct 28 2013 11:01:59
>>         Volume Name: test-volume
>>         Type: Distribute
>>         Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
>>         Status: Started
>>         Number of Bricks: 5
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: nas3-10g:/data9/gvol
>>         Brick2: nas3-10g:/data10/gvol
>>         Brick3: nas3-10g:/data11/gvol
>>         Brick4: nas3-10g:/data12/gvol
>>         Brick5: nas4-10g:/data13/gvol
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org
>>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users