[Gluster-users] Rebalance times in 3.2.5 vs 3.4.2

Matt Edwards matted at MIT.EDU
Thu Feb 27 05:57:28 UTC 2014


Hopefully I'm not derailing this thread too far, but I have a related
rebalance progress/speed issue.

I have a rebalance process started that's been running for 3-4 days.  Is
there a good way to see if it's running successfully, or might this be a
sign of some problem?

This is on a 4-node distribute setup with v3.4.2 and 45T of data.

The *-rebalance.log has been silent since some informational messages when
the rebalance started.  There were a few initial warnings and errors that I
observed, though:


E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0:
SETVOLUME on remote-host failed: Authentication failed

W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4:
failed to set the volume (Permission denied)

W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4:
failed to get 'process-uuid' from reply dict

W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data
available)


"gluster volume status" reports that the rebalance is in progress, the
process listed in vols/<volname>/rebalance/<hash>.pid is still running on
the server, but "gluster volume rebalance <volname> status" reports 0 for
everything (files scanned or rebalanced, failures, run time).

Thanks,

Matt


On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar <shmohan at redhat.com> wrote:

> Hi Viktor,
>
> Lots of optimizations and improvements went in for 3.4 so it should be
> faster than 3.2.
> Just to make sure what's happening could you please check rebalance logs
> which will be in
> /var/log/glusterfs/<volname>-rebalance.log and check is there any
> progress ?
>
> Thanks,
> Shylesh
>
>
> Viktor Villafuerte wrote:
>
>> Anybody can confirm/dispute that this is normal/abnormal?
>>
>> v
>>
>>
>> On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:
>>
>>> Hi all,
>>>
>>> I have distributed replicated set with 2 servers (replicas) and am
>>> trying to add another set of replicas: 1 x (1x1) => 2 x (1x1)
>>>
>>> I have about 23G of data which I copy onto the first replica, check
>>> everything and then add the other set of replicas and eventually
>>> rebalance fix-layout, migrate-data.
>>>
>>> Now on
>>>
>>> Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)
>>>
>>> on
>>>
>>> Gluster v3.4.2 this has been running for almost 4 hours and it's still
>>> not finished
>>>
>>>
>>> As I may have to do this in production, where the amount of data is
>>> significantly larger than 23G, I'm looking at about three weeks of wait
>>> to rebalance :)
>>>
>>> Now my question is if this is as it's meant to be? I can see that v3.4.2
>>> gives me more info about the rebalance process etc, but that surely
>>> cannot justify the enormous time difference.
>>>
>>> Is this normal/expected behaviour? If so I will have to stick with the
>>> v3.2.5 as it seems way quicker.
>>>
>>> Please, let me know if there is any 'well known' option/way/secret to
>>> speed the rebalance up on v3.4.2.
>>>
>>>
>>> thanks
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Viktor Villafuerte
>>> Optus Internet Engineering
>>> t: 02 808-25265
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140227/f6998d30/attachment.html>


More information about the Gluster-users mailing list