[Gluster-users] Rebalance failed on Distributed Disperse volume based on 3.12.14 version
Nithya Balachandran
nbalacha at redhat.com
Mon Oct 8 08:43:07 UTC 2018
Hi Mauro,
Yes, a rebalance consists of 2 operations for every directory:
1. Fix the layout for the new volume config (newly added or removed
bricks)
2. Migrate files to their new hashed subvols based on the new layout
Are you running a rebalance because you added new bricks to the volume ? As
per an earlier email you have already run a fix-layout.
On s04, please check the rebalance log file to see why the rebalance failed.
Regards,
Nithya
On 8 October 2018 at 13:22, Mauro Tridici <mauro.tridici at cmcc.it> wrote:
> Hi All,
>
> for your information, this is the current rebalance status:
>
> [root at s01 ~]# gluster volume rebalance tier2 status
> Node Rebalanced-files size
> scanned failures skipped status run time in
> h:m:s
> --------- ----------- -----------
> ----------- ----------- ----------- ------------
> --------------
> localhost 551922 20.3TB
> 2349397 0 61849 in progress 55:25:38
> s02-stg 287631 13.2TB
> 959954 0 30262 in progress 55:25:39
> s03-stg 288523 12.7TB
> 973111 0 30220 in progress 55:25:39
> s04-stg 0 0Bytes
> 0 0 0 failed 0:00:37
> s05-stg 0 0Bytes
> 0 0 0 completed 48:33:03
> s06-stg 0 0Bytes
> 0 0 0 completed 48:33:02
> Estimated time left for rebalance to complete : 1023:49:56
> volume rebalance: tier2: success
>
> Rebalance is migrating files on s05, s06 servers and on s04 too (although
> it is marked as failed).
> s05 and s06 tasks are completed.
>
> Questions:
>
> 1) it seems that rebalance is moving files, but it is fixing the layout
> also, is it normal?
> 2) when rebalance will be completed, what we need to do before return the
> gluster storage to the users? We have to launch rebalance again in order to
> involve s04 server too or a fix-layout to eventually fix some error on s04?
>
> Thank you very much,
> Mauro
>
>
> Il giorno 07 ott 2018, alle ore 10:29, Mauro Tridici <
> mauro.tridici at cmcc.it> ha scritto:
>
>
> Hi All,
>
> some important updates about the issue mentioned below.
> After rebalance failed on all the servers, I decided to:
>
> - stop gluster volume
> - reboot the servers
> - start gluster volume
> - change some gluster volume options
> - start the rebalance again
>
> The options that I changed are listed below after reading some threads on
> gluster users mailing list:
>
> BEFORE CHANGE:
> gluster volume set tier2 network.ping-timeout 02
> gluster volume set all cluster.brick-multiplex on
> gluster volume set tier2 cluster.server-quorum-ratio 51%
> gluster volume set tier2 cluster.server-quorum-type server
> gluster volume set tier2 cluster.quorum-type auto
>
> AFTER CHANGE:
>
> gluster volume set tier2 network.ping-timeout 42
> gluster volume set all cluster.brick-multiplex off
> gluster volume set tier2 cluster.server-quorum-ratio none
> gluster volume set tier2 cluster.server-quorum-type none
> gluster volume set tier2 cluster.quorum-type none
>
> The result was that rebalance starts moving data from s01, s02 ed s03
> servers to s05 and s06 servers (the new added ones), but it failed on s04
> server after 37 seconds.
> The rebalance is still running and moving data as you can see from the
> output:
>
> [root at s01 ~]# gluster volume rebalance tier2 status
> Node Rebalanced-files size
> scanned failures skipped status run time in
> h:m:s
> --------- ----------- -----------
> ----------- ----------- ----------- ------------
> --------------
> localhost 286680 12.6TB
> 1217960 0 43343 in progress 32:10:24
> s02-stg 126291 12.4TB
> 413077 0 21932 in progress 32:10:25
> s03-stg 126516 11.9TB
> 433014 0 21870 in progress 32:10:25
> s04-stg 0 0Bytes
> 0 0 0 failed 0:00:37
> s05-stg 0 0Bytes
> 0 0 0 in progress 32:10:25
> s06-stg 0 0Bytes
> 0 0 0 in progress 32:10:25
> Estimated time left for rebalance to complete : 624:47:48
> volume rebalance: tier2: success
>
> When rebalance will be completed, we are planning to re-launch it to try
> to involve s04 server also.
> Do you have some idea about what happened in my previous message and why,
> now, rebalance it’s running although it’s not involve s04 server?
>
> In attachment the complete tier2-rebalance.log file related to s04 server.
>
> Thank you very much for your help,
> Mauro
>
>
> <tier2-rebalance.log.gz>
>
> Il giorno 06 ott 2018, alle ore 02:01, Mauro Tridici <
> mauro.tridici at cmcc.it> ha scritto:
>
>
> Hi All,
>
> since we need to restore gluster storage as soon as possible, we decided
> to ignore the few files that could be lost and to go ahead.
> So we cleaned all bricks content of servers s04, s05 and s06.
>
> As planned some days ago, we executed the following commands:
>
> *gluster peer detach s04*
> *gluster peer detach s05*
> *gluster peer detach s06*
>
> *gluster peer probe s04*
> *gluster peer probe s05*
> *gluster peer probe s06*
>
> *gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick
> s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick
> s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick
> s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick
> s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick
> s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick
> s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick
> s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick
> s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick
> s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick
> s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick
> s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick
> s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick
> s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick
> s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick
> s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick
> s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick
> s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick
> s06-stg:/gluster/mnt12/brick force*
>
> *gluster volume rebalance tier2 fix-layout start*
>
> Everything seem to be fine and fix-layout ended.
>
> [root at s01 ~]# gluster volume rebalance tier2 status
> Node
> status run time in h:m:s
> ---------
> ----------- ------------
> localhost
> fix-layout completed 12:11:6
> s02-stg
> fix-layout completed 12:11:18
> s03-stg
> fix-layout completed 12:11:12
> s04-stg
> fix-layout completed 12:11:20
> s05-stg
> fix-layout completed 12:11:14
> s06-stg
> fix-layout completed 12:10:47
> volume rebalance: tier2: success
>
> [root at s01 ~]# gluster volume info
>
> Volume Name: tier2
> Type: Distributed-Disperse
> Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 12 x (4 + 2) = 72
> Transport-type: tcp
> Bricks:
> Brick1: s01-stg:/gluster/mnt1/brick
> Brick2: s02-stg:/gluster/mnt1/brick
> Brick3: s03-stg:/gluster/mnt1/brick
> Brick4: s01-stg:/gluster/mnt2/brick
> Brick5: s02-stg:/gluster/mnt2/brick
> Brick6: s03-stg:/gluster/mnt2/brick
> Brick7: s01-stg:/gluster/mnt3/brick
> Brick8: s02-stg:/gluster/mnt3/brick
> Brick9: s03-stg:/gluster/mnt3/brick
> Brick10: s01-stg:/gluster/mnt4/brick
> Brick11: s02-stg:/gluster/mnt4/brick
> Brick12: s03-stg:/gluster/mnt4/brick
> Brick13: s01-stg:/gluster/mnt5/brick
> Brick14: s02-stg:/gluster/mnt5/brick
> Brick15: s03-stg:/gluster/mnt5/brick
> Brick16: s01-stg:/gluster/mnt6/brick
> Brick17: s02-stg:/gluster/mnt6/brick
> Brick18: s03-stg:/gluster/mnt6/brick
> Brick19: s01-stg:/gluster/mnt7/brick
> Brick20: s02-stg:/gluster/mnt7/brick
> Brick21: s03-stg:/gluster/mnt7/brick
> Brick22: s01-stg:/gluster/mnt8/brick
> Brick23: s02-stg:/gluster/mnt8/brick
> Brick24: s03-stg:/gluster/mnt8/brick
> Brick25: s01-stg:/gluster/mnt9/brick
> Brick26: s02-stg:/gluster/mnt9/brick
> Brick27: s03-stg:/gluster/mnt9/brick
> Brick28: s01-stg:/gluster/mnt10/brick
> Brick29: s02-stg:/gluster/mnt10/brick
> Brick30: s03-stg:/gluster/mnt10/brick
> Brick31: s01-stg:/gluster/mnt11/brick
> Brick32: s02-stg:/gluster/mnt11/brick
> Brick33: s03-stg:/gluster/mnt11/brick
> Brick34: s01-stg:/gluster/mnt12/brick
> Brick35: s02-stg:/gluster/mnt12/brick
> Brick36: s03-stg:/gluster/mnt12/brick
> Brick37: s04-stg:/gluster/mnt1/brick
> Brick38: s05-stg:/gluster/mnt1/brick
> Brick39: s06-stg:/gluster/mnt1/brick
> Brick40: s04-stg:/gluster/mnt2/brick
> Brick41: s05-stg:/gluster/mnt2/brick
> Brick42: s06-stg:/gluster/mnt2/brick
> Brick43: s04-stg:/gluster/mnt3/brick
> Brick44: s05-stg:/gluster/mnt3/brick
> Brick45: s06-stg:/gluster/mnt3/brick
> Brick46: s04-stg:/gluster/mnt4/brick
> Brick47: s05-stg:/gluster/mnt4/brick
> Brick48: s06-stg:/gluster/mnt4/brick
> Brick49: s04-stg:/gluster/mnt5/brick
> Brick50: s05-stg:/gluster/mnt5/brick
> Brick51: s06-stg:/gluster/mnt5/brick
> Brick52: s04-stg:/gluster/mnt6/brick
> Brick53: s05-stg:/gluster/mnt6/brick
> Brick54: s06-stg:/gluster/mnt6/brick
> Brick55: s04-stg:/gluster/mnt7/brick
> Brick56: s05-stg:/gluster/mnt7/brick
> Brick57: s06-stg:/gluster/mnt7/brick
> Brick58: s04-stg:/gluster/mnt8/brick
> Brick59: s05-stg:/gluster/mnt8/brick
> Brick60: s06-stg:/gluster/mnt8/brick
> Brick61: s04-stg:/gluster/mnt9/brick
> Brick62: s05-stg:/gluster/mnt9/brick
> Brick63: s06-stg:/gluster/mnt9/brick
> Brick64: s04-stg:/gluster/mnt10/brick
> Brick65: s05-stg:/gluster/mnt10/brick
> Brick66: s06-stg:/gluster/mnt10/brick
> Brick67: s04-stg:/gluster/mnt11/brick
> Brick68: s05-stg:/gluster/mnt11/brick
> Brick69: s06-stg:/gluster/mnt11/brick
> Brick70: s04-stg:/gluster/mnt12/brick
> Brick71: s05-stg:/gluster/mnt12/brick
> Brick72: s06-stg:/gluster/mnt12/brick
> Options Reconfigured:
> network.ping-timeout: 42
> features.scrub: Active
> features.bitrot: on
> features.inode-quota: on
> features.quota: on
> performance.client-io-threads: on
> cluster.min-free-disk: 10
> cluster.quorum-type: none
> transport.address-family: inet
> nfs.disable: on
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.readdir-ahead: on
> performance.parallel-readdir: off
> cluster.readdir-optimize: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 50000
> performance.io-cache: off
> disperse.cpu-extensions: auto
> performance.io-thread-count: 16
> features.quota-deem-statfs: on
> features.default-soft-limit: 90
> cluster.server-quorum-type: none
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
> cluster.brick-multiplex: off
> cluster.server-quorum-ratio: 51%
>
> The last step should be the data rebalance between the servers, but
> rebalance failed soon with a lot of errors like the following ones:
>
> [2018-10-05 23:48:38.644978] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk]
> 0-tier2-client-70: Server lk version = 1
> [2018-10-05 23:48:44.735323] I [dht-rebalance.c:4512:gf_defrag_start_crawl]
> 0-tier2-dht: gf_defrag_start_crawl using commit hash 3720331860
> [2018-10-05 23:48:44.736205] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-7: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.736266] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-7: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.736282] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-7: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:44.736377] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-8: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.736436] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-8: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.736459] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-8: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:44.736460] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-10: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.736537] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-9: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.736571] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-10: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.736574] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-9: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-9: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:44.736604] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-10: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:44.736827] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-11: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.736887] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-11: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.736904] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-11: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:44.740337] W [MSGID: 122040]
> [ec-common.c:1097:ec_prepare_update_cbk] 0-tier2-disperse-6: Failed to
> get size and version [Input/output error]
> [2018-10-05 23:48:44.740381] E [MSGID: 122034] [ec-common.c:613:ec_child_select]
> 0-tier2-disperse-6: Insufficient available children for this request (have
> 0, need 4)
> [2018-10-05 23:48:44.740394] E [MSGID: 122037] [ec-common.c:2040:ec_update_size_version_done]
> 0-tier2-disperse-6: Failed to update version and size [Input/output error]
> [2018-10-05 23:48:50.066103] I [MSGID: 109081] [dht-common.c:4379:dht_setxattr]
> 0-tier2-dht: fixing the layout of /
>
> In attachment you can find the first logs captured during the rebalance
> execution.
> In your opinion, is there a way to restore the gluster storage or all the
> data have been lost?
>
> Thank you in advance,
> Mauro
>
> <rebalance_log.txt>
>
>
>
> Il giorno 04 ott 2018, alle ore 15:31, Mauro Tridici <
> mauro.tridici at cmcc.it> ha scritto:
>
>
> Hi Nithya,
>
> thank you very much.
> This is the current “gluster volume info” output after removing bricks
> (and after peer detach command).
>
> [root at s01 ~]# gluster volume info
>
> Volume Name: tier2
> Type: Distributed-Disperse
> Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6 x (4 + 2) = 36
> Transport-type: tcp
> Bricks:
> Brick1: s01-stg:/gluster/mnt1/brick
> Brick2: s02-stg:/gluster/mnt1/brick
> Brick3: s03-stg:/gluster/mnt1/brick
> Brick4: s01-stg:/gluster/mnt2/brick
> Brick5: s02-stg:/gluster/mnt2/brick
> Brick6: s03-stg:/gluster/mnt2/brick
> Brick7: s01-stg:/gluster/mnt3/brick
> Brick8: s02-stg:/gluster/mnt3/brick
> Brick9: s03-stg:/gluster/mnt3/brick
> Brick10: s01-stg:/gluster/mnt4/brick
> Brick11: s02-stg:/gluster/mnt4/brick
> Brick12: s03-stg:/gluster/mnt4/brick
> Brick13: s01-stg:/gluster/mnt5/brick
> Brick14: s02-stg:/gluster/mnt5/brick
> Brick15: s03-stg:/gluster/mnt5/brick
> Brick16: s01-stg:/gluster/mnt6/brick
> Brick17: s02-stg:/gluster/mnt6/brick
> Brick18: s03-stg:/gluster/mnt6/brick
> Brick19: s01-stg:/gluster/mnt7/brick
> Brick20: s02-stg:/gluster/mnt7/brick
> Brick21: s03-stg:/gluster/mnt7/brick
> Brick22: s01-stg:/gluster/mnt8/brick
> Brick23: s02-stg:/gluster/mnt8/brick
> Brick24: s03-stg:/gluster/mnt8/brick
> Brick25: s01-stg:/gluster/mnt9/brick
> Brick26: s02-stg:/gluster/mnt9/brick
> Brick27: s03-stg:/gluster/mnt9/brick
> Brick28: s01-stg:/gluster/mnt10/brick
> Brick29: s02-stg:/gluster/mnt10/brick
> Brick30: s03-stg:/gluster/mnt10/brick
> Brick31: s01-stg:/gluster/mnt11/brick
> Brick32: s02-stg:/gluster/mnt11/brick
> Brick33: s03-stg:/gluster/mnt11/brick
> Brick34: s01-stg:/gluster/mnt12/brick
> Brick35: s02-stg:/gluster/mnt12/brick
> Brick36: s03-stg:/gluster/mnt12/brick
> Options Reconfigured:
> network.ping-timeout: 0
> features.scrub: Active
> features.bitrot: on
> features.inode-quota: on
> features.quota: on
> performance.client-io-threads: on
> cluster.min-free-disk: 10
> cluster.quorum-type: auto
> transport.address-family: inet
> nfs.disable: on
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.readdir-ahead: on
> performance.parallel-readdir: off
> cluster.readdir-optimize: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 50000
> performance.io-cache: off
> disperse.cpu-extensions: auto
> performance.io-thread-count: 16
> features.quota-deem-statfs: on
> features.default-soft-limit: 90
> cluster.server-quorum-type: server
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
> cluster.brick-multiplex: on
> cluster.server-quorum-ratio: 51%
>
> Regards,
> Mauro
>
> Il giorno 04 ott 2018, alle ore 15:22, Nithya Balachandran <
> nbalacha at redhat.com> ha scritto:
>
> Hi Mauro,
>
>
> The files on s04 and s05 can be deleted safely as long as those bricks
> have been removed from the volume and their brick processes are not running.
>
>
> .glusterfs/indices/xattrop/xattrop-* are links to files that need to be healed.
> .glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008 links to files that bitrot (if enabled)says are corrupted. (none in this case)
>
>
>
> I will get back to you on s06. Can you please provide the output of gluster volume info again?
>
>
> Regards,
> Nithya
>
>
>
> On 4 October 2018 at 13:47, Mauro Tridici <mauro.tridici at cmcc.it> wrote:
>
>>
>> Dear Ashish, Dear Nithya,
>>
>> I’m writing this message only to summarize and simplify the information
>> about the "not migrated” files left on removed bricks on server s04, s05
>> and s06.
>> In attachment, you can find 3 files (1 file for each server) containing
>> the “not migrated” files lists and related brick number.
>>
>> In particular:
>>
>> - s04 and s05 bricks contain only not migrated files in hidden
>> directories “/gluster/mnt#/brick/.glusterfs" (I could delete them,
>> doesn’t it?)
>> - s06 bricks contain
>> - not migrated files in hidden directories “/gluster/mnt#/bri
>> ck/.glusterfs”;
>> - not migrated files with size equal to 0;
>> - not migrated files with size greater than 0.
>>
>>
>> I think it was necessary to collect and summarize information to simplify
>> your analysis.
>> Thank you very much,
>> Mauro
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181008/420f961b/attachment.html>
More information about the Gluster-users
mailing list