[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)
Stephen Remde
stephen.remde at gaist.co.uk
Tue Dec 18 09:26:31 UTC 2018
Nithya,
I've realised, I will not have enough space on the other bricks in my
cluster to migrate data off the server so I can remove the single brick -
is there a work around?
As you can see below, the new brick was created with the wrong raid
configuration, so I want to remove it recreate the raid, and re add it.
xxxxxx Filesystem Size Used Avail Use% Mounted on
dc4-01 /dev/md0 95T 87T 8.0T 92% /export/md0
dc4-01 /dev/md1 95T 87T 8.4T 92% /export/md1
dc4-01 /dev/md2 95T 86T 9.3T 91% /export/md2
dc4-01 /dev/md3 95T 86T 8.9T 91% /export/md3
dc4-02 /dev/md0 95T 89T 6.5T 94% /export/md0
dc4-02 /dev/md1 95T 87T 8.4T 92% /export/md1
dc4-02 /dev/md2 95T 87T 8.6T 91% /export/md2
dc4-02 /dev/md3 95T 86T 8.8T 91% /export/md3
dc4-03 /dev/md0 95T 74T 21T 78% /export/md0
dc4-03 /dev/md1 102T 519G 102T 1% /export/md1
This is the backup storage, so if I HAVE to lose the 519GB and resync,
that's an acceptable worst-case.
gluster> v info video-backup
Volume Name: video-backup
Type: Distribute
Volume ID: 887bdc2a-ca5e-4ca2-b30d-86831839ed04
Status: Started
Snapshot Count: 0
Number of Bricks: 10
Transport-type: tcp
Bricks:
Brick1: 10.0.0.41:/export/md0/brick
Brick2: 10.0.0.42:/export/md0/brick
Brick3: 10.0.0.43:/export/md0/brick
Brick4: 10.0.0.41:/export/md1/brick
Brick5: 10.0.0.42:/export/md1/brick
Brick6: 10.0.0.41:/export/md2/brick
Brick7: 10.0.0.42:/export/md2/brick
Brick8: 10.0.0.41:/export/md3/brick
Brick9: 10.0.0.42:/export/md3/brick
Brick10: 10.0.0.43:/export/md1/brick
Options Reconfigured:
cluster.rebal-throttle: aggressive
cluster.min-free-disk: 1%
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
Best,
Steve
On Wed, 12 Dec 2018 at 03:07, Nithya Balachandran <nbalacha at redhat.com>
wrote:
>
> This is the current behaviour of rebalance and nothing to be concerned
> about - it will migrate data on all bricks on the nodes which host the
> bricks being removed. The data on the removed bricks will be moved to other
> bricks, some of the data on the other bricks on the node will just be
> moved to other bricks based on the new directory layouts.
> I will fix this in the near future but you don't need to to stop the
> remove-brick operation.
>
> Regards,
> Nithya
>
> On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde at gaist.co.uk>
> wrote:
>
>> I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick.
>>
>> gluster> volume status
>> Status of volume: video-backup
>> Gluster process TCP Port RDMA Port Online Pid
>> ------------------------------------------------------------------------------
>> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306
>> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651
>> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826
>> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311
>> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656
>> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316
>> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662
>> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322
>> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667
>> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836
>>
>> Task Status of Volume video-backup
>> ------------------------------------------------------------------------------
>> Task : Rebalance
>> ID : 7895be7c-4ab9-440d-a301-c11dae0dd9e1
>> Status : completed
>>
>> gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start
>> volume remove-brick start: success
>> ID: f666a196-03c2-4940-bd38-45d8383345a4
>>
>> gluster> volume status
>> Status of volume: video-backup
>> Gluster process TCP Port RDMA Port Online Pid
>> ------------------------------------------------------------------------------
>> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306
>> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651
>> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826
>> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311
>> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656
>> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316
>> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662
>> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322
>> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667
>> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836
>>
>> Task Status of Volume video-backup
>> ------------------------------------------------------------------------------
>> Task : Remove brick
>> ID : f666a196-03c2-4940-bd38-45d8383345a4
>> Removed bricks:
>> 10.0.0.43:/export/md1/brick
>> Status : in progress
>>
>>
>> But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick
>>
>>
>> .....
>> [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9*
>> ....
>> 29: volume video-backup-client-2
>> 30: type protocol/client
>> 31: option clnt-lk-version 1
>> 32: option volfile-checksum 0
>> 33: option volfile-key rebalance/video-backup
>> 34: option client-version 3.8.15
>> 35: option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0
>> 36: option fops-version 1298437
>> 37: option ping-timeout 42
>> 38: option remote-host 10.0.0.43
>> 39: option remote-subvolume /export/md0/brick
>> 40: option transport-type socket
>> 41: option transport.address-family inet
>> 42: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
>> 43: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
>> 44: end-volume
>> ...
>> 112: volume video-backup-client-9
>> 113: type protocol/client
>> 114: option ping-timeout 42
>> 115: option remote-host 10.0.0.43
>> 116: option remote-subvolume /export/md1/brick
>> 117: option transport-type socket
>> 118: option transport.address-family inet
>> 119: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
>> 120: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
>> 121: end-volume
>> ...
>> [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106
>> [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /
>> [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2
>> [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9
>> ...
>> [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /
>> [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6
>> [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6
>> [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs
>> [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017
>> [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017
>> [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4
>> [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs
>> [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4
>>
>> etc...
>>
>> Can I stop/cancel it without data loss? How can I make gluster remove the correct brick?
>>
>> Thanks
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181218/0d0473aa/attachment.html>
More information about the Gluster-users
mailing list