[Gluster-users] Pausing rebalance
Franco Broi
franco.broi at iongeo.com
Tue Dec 10 06:32:46 UTC 2013
Thanks for clearing that up. I had to wait about 30 minutes for all
rebalancing activity to cease, then I was able to add a new brick.
What does it use to migrate the files? The copy rate was pretty slow
considering both bricks were on the same server, I only saw about
200MB/Sec. Each brick is a 16 disk ZFS raidz2, copying with dd I can get
well over 500MB/Sec.
On Tue, 2013-12-10 at 11:30 +0530, Kaushal M wrote:
> On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com> wrote:
> > On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
> >> Hi Franco,
> >>
> >>
> >> If a file is under migration, and a rebalance stop is encountered,
> >> then rebalance process exits only after the completion of the
> >> migration.
> >>
> >> That might be one of the reasons why you saw rebalance in progress
> >> message while trying to add the brick
> >
> > The status said it was stopped. I didn't do a top on the machine but are
> > you saying that it was still rebalancing despite saying it had stopped?
> >
>
> The 'stopped' status is a little bit misleading. The rebalance process
> could have been migrating a large file when the stop command was
> issued, so the process would continue migrating that file and quit
> once it finished. In this time period, though the status says
> 'stopped' the rebalance process is actually running, which prevents
> other operations from happening. Ideally, we would have a 'stopping'
> status which would convey the correct meaning. But for now we can only
> verify that a rebalance process has actually stopped by monitoring the
> actual rebalance process. The rebalance process is a 'glusterfs'
> process with some arguments containing rebalance.
>
> >>
> >> Could you please share the average file size in your setup?
> >>
> >
> > Bit hard to say, I just copied some data from our main processing
> > system. The sizes range from very small to 10's of gigabytes.
> >
> >>
> >> You could always check the rebalance status command to ensure
> >> rebalance has indeed completed/stopped before proceeding with the
> >> add-brick. Using add-brick force while rebalance is on-going should
> >> not be used in normal scenarios. I do see that in your case, they show
> >> stopped/completed. Glusterd logs would help in triaging the issue.
> >
> > See attached.
> >
> >>
> >>
> >> Rebalance re-writes layouts, and migrates data. While this is
> >> happening, if a add-brick is done, then the cluster might go into a
> >> imbalanced stated. Hence, the check if rebalance is in progress while
> >> doing add-brick
> >
> > I can see that but as far as I could tell, the rebalance had stopped
> > according to the status.
> >
> > Just to be clear, what command restarts the rebalancing?
> >
> >>
> >>
> >> With regards,
> >> Shishir
> >>
> >>
> >>
> >> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:
> >>
> >> Before attempting a rebalance on my existing distributed
> >> Gluster volume
> >> I thought I'd do some testing with my new storage. I created a
> >> volume
> >> consisting of 4 bricks on the same server and wrote some data
> >> to it. I
> >> then added a new brick from a another server. I ran the
> >> fix-layout and
> >> wrote some new files and could see them on the new brick. All
> >> good so
> >> far, so I started the data rebalance. After it had been
> >> running for a
> >> while I wanted to add another brick, which I obviously
> >> couldn't do while
> >> it was running so I stopped it. Even with it stopped It
> >> wouldn't let me
> >> add a brick so I tried restarting it, but it wouldn't let me
> >> do that
> >> either. I presume you just reissue the start command as
> >> there's no
> >> restart?
> >>
> >> [root at nas3 ~]# gluster vol rebalance test-volume status
> >> Node Rebalanced-files
> >> size scanned failures skipped
> >> status run time in secs
> >> --------- ----------- ----------- -----------
> >> ----------- ----------- ------------ --------------
> >> localhost 7 611.7GB 1358
> >> 0 10 stopped 4929.00
> >> localhost 7 611.7GB 1358
> >> 0 10 stopped 4929.00
> >> nas4-10g 0 0Bytes 1506
> >> 0 0 completed 8.00
> >> volume rebalance: test-volume: success:
> >> [root at nas3 ~]# gluster vol add-brick test-volume
> >> nas4-10g:/data14/gvol
> >> volume add-brick: failed: Volume name test-volume rebalance is
> >> in progress. Please retry after completion
> >> [root at nas3 ~]# gluster vol rebalance test-volume start
> >> volume rebalance: test-volume: failed: Rebalance on
> >> test-volume is already started
> >>
> >> In the end I used the force option to make it start but was
> >> that the
> >> right thing to do?
> >>
> >> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
> >> Volume Name: test-volume
> >> Type: Distribute
> >> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> >> Status: Started
> >> Number of Bricks: 5
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: nas3-10g:/data9/gvol
> >> Brick2: nas3-10g:/data10/gvol
> >> Brick3: nas3-10g:/data11/gvol
> >> Brick4: nas3-10g:/data12/gvol
> >> Brick5: nas4-10g:/data13/gvol
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list