[Gluster-devel] Gluster 5.10 rebalance stuck

Shreyansh Shah shreyansh.shah at alpha-grep.com
Fri Nov 11 09:13:46 UTC 2022


Hi Gluster Dev's,
Any leads on the above? We are kinda stuck at the moment.

On Mon, Nov 7, 2022 at 2:13 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> Hi Dev list,
>
> How can I find the details about the rebalance_status/status ids ? Is it
> actually normal that some systems are in '4' , others in '3' ?
>
> Is it safe to forcefully start a new rebalance ?
>
> Best Regards,
> Strahil Nikolov
>
> On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah
> <shreyansh.shah at alpha-grep.com> wrote:
> Hi Strahil,
> Adding the info below:
>
> --------------------------------------
> Node IP = 10.132.0.19
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=27054
> size=7104425578505
> scanned=72141
> failures=10
> skipped=19611
> run-time=92805.000000
> --------------------------------------
> Node IP = 10.132.0.20
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=23945
> size=7126809216060
> scanned=71208
> failures=7
> skipped=18834
> run-time=94029.000000
> --------------------------------------
> Node IP = 10.132.1.12
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=12533
> size=12945021256
> scanned=40398
> failures=14
> skipped=1194
> run-time=92201.000000
> --------------------------------------
> Node IP = 10.132.1.13
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=41483
> size=8845076025598
> scanned=179920
> failures=25
> skipped=62373
> run-time=130017.000000
> --------------------------------------
> Node IP = 10.132.1.14
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=43603
> size=7834691799355
> scanned=204140
> failures=2878
> skipped=87761
> run-time=130016.000000
> --------------------------------------
> Node IP = 10.132.1.15
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=29968
> size=6389568855140
> scanned=69320
> failures=7
> skipped=17999
> run-time=93654.000000
> --------------------------------------
> Node IP = 10.132.1.16
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=23226
> size=5899338197718
> scanned=56169
> failures=7
> skipped=12659
> run-time=94030.000000
> --------------------------------------
> Node IP = 10.132.1.17
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=17538
> size=6247281008602
> scanned=50038
> failures=8
> skipped=11335
> run-time=92203.000000
> --------------------------------------
> Node IP = 10.132.1.18
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=20394
> size=6395008466977
> scanned=50060
> failures=7
> skipped=13784
> run-time=92103.000000
> --------------------------------------
> Node IP = 10.132.1.19
> rebalance_status=1
> status=1
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=0
> failures=0
> skipped=0
> run-time=0.000000
> --------------------------------------
> Node IP = 10.132.1.20
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=24
> failures=0
> skipped=2
> run-time=1514.000000
>
> On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
>
> And the other servers ?
>
> On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah
> <shreyansh.shah at alpha-grep.com> wrote:
> Hi Strahil,
> Thank you for your reply. node_state.info has the below data
>
> root at gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=24
> failures=0
> skipped=2
> run-time=1514.000000
>
>
>
>
> On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
>
> I would check the details in /var/lib/glusterd/vols/<VOLUME_NAME>/
> node_state.info
>
> Best Regards,
> Strahil Nikolov
>
> On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah
> <shreyansh.shah at alpha-grep.com> wrote:
> Hi,
> I Would really appreciate it if someone would be able to help on the above
> issue. We are stuck as we cannot run rebalance due to this and thus are not
> able to extract peak performance from the setup due to unbalanced data.
> Adding gluster info (without the bricks) below. Please let me know if any
> other details/logs are needed.
>
> Volume Name: data
> Type: Distribute
> Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 41
> Transport-type: tcp
> Options Reconfigured:
> server.event-threads: 4
> network.ping-timeout: 90
> client.keepalive-time: 60
> server.keepalive-time: 60
> storage.health-check-interval: 60
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> performance.cache-size: 8GB
> performance.cache-refresh-timeout: 60
> cluster.min-free-disk: 3%
> client.event-threads: 4
> performance.io-thread-count: 16
>
>
>
> On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah <
> shreyansh.shah at alpha-grep.com> wrote:
>
> Hi,
> We are running glusterfs 5.10 server volume. Recently we added a few new
> bricks and started a rebalance operation. After a couple of days the
> rebalance operation was just stuck, with one of the peers showing
> In-Progress with no file being read/transferred and the rest showing
> Failed/Completed, so we stopped it using "gluster volume rebalance data
> stop". Now when we are trying to start it again, we get the below error.
> Any assistance would be appreciated
>
> root at gluster-11:~# gluster volume rebalance data status
> volume rebalance: data: failed: Rebalance not started for volume data.
> root at gluster-11:~# gluster volume rebalance data start
> volume rebalance: data: failed: Rebalance on data is already started
> root at gluster-11:~# gluster volume rebalance data stop
> volume rebalance: data: failed: Rebalance not started for volume data.
>
>
>
> --
> Regards,
> Shreyansh Shah
> AlphaGrep* Securities Pvt. Ltd.*
>
>
>
> --
> Regards,
> Shreyansh Shah
> AlphaGrep* Securities Pvt. Ltd.*
>
>
>
> --
> Regards,
> Shreyansh Shah
> AlphaGrep* Securities Pvt. Ltd.*
>
>
>
> --
> Regards,
> Shreyansh Shah
> AlphaGrep* Securities Pvt. Ltd.*
>
>

-- 
Regards,
Shreyansh Shah
AlphaGrep* Securities Pvt. Ltd.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20221111/22445a0d/attachment.html>


More information about the Gluster-devel mailing list