[Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

Matt Edwards matted at MIT.EDU
Fri Feb 28 02:54:46 UTC 2014


Hi Viktor,

Thanks for the tips.  I'm a bit confused, since the clients mount the share
fine, and "gluster peer status" and "gluster volume status all detail" are
happy.

What is the expected output of "rebalance status" for just a fix-layout
run?  I believe the last time I did that, the status was always 0s (which
makes some sense, as files aren't moving) and the log was empty, but the
operation seemed to complete successfully.  Does a file rebalance first
require a fix-layout operation internally, and is it possible that my
volume is still in that phase?  Or I making up an overly optimistic
scenario?

Thanks,

Matt


On Thu, Feb 27, 2014 at 8:33 PM, Viktor Villafuerte <
viktor.villafuerte at optusnet.com.au> wrote:

> Hi Matt,
>
> if the 'status' says 0 for everything that's not good. Normally when I
> do rebalance the numbers should change (up). Also the rebalance log
> should show files being moved around.
>
> For the errors - my (limited) experience with Gluster is that the 'W'
> are normally harmless and they show up quite a bit. For the actuall
> error 'E' you could try to play with 'auth.allow' as suggested here
>
> http://gluster.org/pipermail/gluster-users/2011-November/009094.html
>
>
> Normally when rebalancing I do count of files on the bricks and the
> Gluster mount to make sure they eventually add up. Also I grep and count
> '-T' and see how the count goes down and 'rw' count goes up.
>
> v
>
>
>
>
> On Thu 27 Feb 2014 00:57:28, Matt Edwards wrote:
> > Hopefully I'm not derailing this thread too far, but I have a related
> > rebalance progress/speed issue.
> >
> > I have a rebalance process started that's been running for 3-4 days.  Is
> > there a good way to see if it's running successfully, or might this be a
> > sign of some problem?
> >
> > This is on a 4-node distribute setup with v3.4.2 and 45T of data.
> >
> > The *-rebalance.log has been silent since some informational messages
> when
> > the rebalance started.  There were a few initial warnings and errors
> that I
> > observed, though:
> >
> >
> > E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0:
> > SETVOLUME on remote-host failed: Authentication failed
> >
> > W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4:
> > failed to set the volume (Permission denied)
> >
> > W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4:
> > failed to get 'process-uuid' from reply dict
> >
> > W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data
> > available)
> >
> >
> > "gluster volume status" reports that the rebalance is in progress, the
> > process listed in vols/<volname>/rebalance/<hash>.pid is still running on
> > the server, but "gluster volume rebalance <volname> status" reports 0 for
> > everything (files scanned or rebalanced, failures, run time).
> >
> > Thanks,
> >
> > Matt
> >
> >
> > On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar <shmohan at redhat.com>
> wrote:
> >
> > > Hi Viktor,
> > >
> > > Lots of optimizations and improvements went in for 3.4 so it should be
> > > faster than 3.2.
> > > Just to make sure what's happening could you please check rebalance
> logs
> > > which will be in
> > > /var/log/glusterfs/<volname>-rebalance.log and check is there any
> > > progress ?
> > >
> > > Thanks,
> > > Shylesh
> > >
> > >
> > > Viktor Villafuerte wrote:
> > >
> > >> Anybody can confirm/dispute that this is normal/abnormal?
> > >>
> > >> v
> > >>
> > >>
> > >> On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I have distributed replicated set with 2 servers (replicas) and am
> > >>> trying to add another set of replicas: 1 x (1x1) => 2 x (1x1)
> > >>>
> > >>> I have about 23G of data which I copy onto the first replica, check
> > >>> everything and then add the other set of replicas and eventually
> > >>> rebalance fix-layout, migrate-data.
> > >>>
> > >>> Now on
> > >>>
> > >>> Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)
> > >>>
> > >>> on
> > >>>
> > >>> Gluster v3.4.2 this has been running for almost 4 hours and it's
> still
> > >>> not finished
> > >>>
> > >>>
> > >>> As I may have to do this in production, where the amount of data is
> > >>> significantly larger than 23G, I'm looking at about three weeks of
> wait
> > >>> to rebalance :)
> > >>>
> > >>> Now my question is if this is as it's meant to be? I can see that
> v3.4.2
> > >>> gives me more info about the rebalance process etc, but that surely
> > >>> cannot justify the enormous time difference.
> > >>>
> > >>> Is this normal/expected behaviour? If so I will have to stick with
> the
> > >>> v3.2.5 as it seems way quicker.
> > >>>
> > >>> Please, let me know if there is any 'well known' option/way/secret to
> > >>> speed the rebalance up on v3.4.2.
> > >>>
> > >>>
> > >>> thanks
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Regards
> > >>>
> > >>> Viktor Villafuerte
> > >>> Optus Internet Engineering
> > >>> t: 02 808-25265
> > >>> _______________________________________________
> > >>> Gluster-users mailing list
> > >>> Gluster-users at gluster.org
> > >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >>>
> > >>
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >
>
> --
> Regards
>
> Viktor Villafuerte
> Optus Internet Engineering
> t: 02 808-25265
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140227/5cddc7b6/attachment.html>


More information about the Gluster-users mailing list