[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)

Stephen Remde stephen.remde at gaist.co.uk
Tue Dec 18 15:40:45 UTC 2018


Nithya,

You are correct, but as you stated earlier, it also has to migrate data
from other bricks on the same host, so another 74TB on dc4-03 /dev/md0
needs to be migrated?

> This is the current behaviour of rebalance and nothing to be concerned
about - it will migrate data on all bricks on the nodes which host the
bricks being removed


Steve

On Tue, 18 Dec 2018 at 15:37, Nithya Balachandran <nbalacha at redhat.com>
wrote:

>
>
> On Tue, 18 Dec 2018 at 14:56, Stephen Remde <stephen.remde at gaist.co.uk>
> wrote:
>
>> Nithya,
>>
>> I've realised, I will not have enough space on the other bricks in my
>> cluster to migrate data off the server so I can remove the single brick -
>> is there a work around?
>>
>> As you can see below, the new brick was created with the wrong raid
>> configuration, so I want to remove it recreate the raid, and re add it.
>>
>> xxxxxx Filesystem      Size  Used Avail Use% Mounted on
>> dc4-01 /dev/md0         95T   87T  8.0T  92% /export/md0
>> dc4-01 /dev/md1         95T   87T  8.4T  92% /export/md1
>> dc4-01 /dev/md2         95T   86T  9.3T  91% /export/md2
>> dc4-01 /dev/md3         95T   86T  8.9T  91% /export/md3
>> dc4-02 /dev/md0         95T   89T  6.5T  94% /export/md0
>> dc4-02 /dev/md1         95T   87T  8.4T  92% /export/md1
>> dc4-02 /dev/md2         95T   87T  8.6T  91% /export/md2
>> dc4-02 /dev/md3         95T   86T  8.8T  91% /export/md3
>> dc4-03 /dev/md0         95T   74T   21T  78% /export/md0
>> dc4-03 /dev/md1        102T  519G  102T   1% /export/md1
>>
>>
> I believe this is the brick being removed - the one that has about 519G of
> data? If I have understood the scenario properly, there seems to be plenty
> of free space on the other bricks (most seem to have terabytes free) . Is
> there something I am missing ?
>
> Regards,
> Nithya
>
>
>> This is the backup storage, so if I HAVE to lose the 519GB and resync,
>> that's an acceptable worst-case.
>>
>> gluster> v info video-backup
>>
>> Volume Name: video-backup
>> Type: Distribute
>> Volume ID: 887bdc2a-ca5e-4ca2-b30d-86831839ed04
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.41:/export/md0/brick
>> Brick2: 10.0.0.42:/export/md0/brick
>> Brick3: 10.0.0.43:/export/md0/brick
>> Brick4: 10.0.0.41:/export/md1/brick
>> Brick5: 10.0.0.42:/export/md1/brick
>> Brick6: 10.0.0.41:/export/md2/brick
>> Brick7: 10.0.0.42:/export/md2/brick
>> Brick8: 10.0.0.41:/export/md3/brick
>> Brick9: 10.0.0.42:/export/md3/brick
>> Brick10: 10.0.0.43:/export/md1/brick
>> Options Reconfigured:
>> cluster.rebal-throttle: aggressive
>> cluster.min-free-disk: 1%
>> transport.address-family: inet
>> performance.readdir-ahead: on
>> nfs.disable: on
>>
>>
>> Best,
>>
>> Steve
>>
>>
>> On Wed, 12 Dec 2018 at 03:07, Nithya Balachandran <nbalacha at redhat.com>
>> wrote:
>>
>>>
>>> This is the current behaviour of rebalance and nothing to be concerned
>>> about - it will migrate data on all bricks on the nodes which host the
>>> bricks being removed. The data on the removed bricks will be moved to other
>>> bricks, some of the  data on the other bricks on the node will just be
>>> moved to other bricks based on the new directory layouts.
>>> I will fix this in the near future but you don't need to to stop the
>>> remove-brick operation.
>>>
>>> Regards,
>>> Nithya
>>>
>>> On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde at gaist.co.uk>
>>> wrote:
>>>
>>>> I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick.
>>>>
>>>> gluster> volume status
>>>> Status of volume: video-backup
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick 10.0.0.41:/export/md0/brick           49172     0          Y       5306
>>>> Brick 10.0.0.42:/export/md0/brick           49172     0          Y       3651
>>>> Brick 10.0.0.43:/export/md0/brick           49155     0          Y       2826
>>>> Brick 10.0.0.41:/export/md1/brick           49173     0          Y       5311
>>>> Brick 10.0.0.42:/export/md1/brick           49173     0          Y       3656
>>>> Brick 10.0.0.41:/export/md2/brick           49174     0          Y       5316
>>>> Brick 10.0.0.42:/export/md2/brick           49174     0          Y       3662
>>>> Brick 10.0.0.41:/export/md3/brick           49175     0          Y       5322
>>>> Brick 10.0.0.42:/export/md3/brick           49175     0          Y       3667
>>>> Brick 10.0.0.43:/export/md1/brick           49156     0          Y       4836
>>>>
>>>> Task Status of Volume video-backup
>>>> ------------------------------------------------------------------------------
>>>> Task                 : Rebalance
>>>> ID                   : 7895be7c-4ab9-440d-a301-c11dae0dd9e1
>>>> Status               : completed
>>>>
>>>> gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start
>>>> volume remove-brick start: success
>>>> ID: f666a196-03c2-4940-bd38-45d8383345a4
>>>>
>>>> gluster> volume status
>>>> Status of volume: video-backup
>>>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick 10.0.0.41:/export/md0/brick           49172     0          Y       5306
>>>> Brick 10.0.0.42:/export/md0/brick           49172     0          Y       3651
>>>> Brick 10.0.0.43:/export/md0/brick           49155     0          Y       2826
>>>> Brick 10.0.0.41:/export/md1/brick           49173     0          Y       5311
>>>> Brick 10.0.0.42:/export/md1/brick           49173     0          Y       3656
>>>> Brick 10.0.0.41:/export/md2/brick           49174     0          Y       5316
>>>> Brick 10.0.0.42:/export/md2/brick           49174     0          Y       3662
>>>> Brick 10.0.0.41:/export/md3/brick           49175     0          Y       5322
>>>> Brick 10.0.0.42:/export/md3/brick           49175     0          Y       3667
>>>> Brick 10.0.0.43:/export/md1/brick           49156     0          Y       4836
>>>>
>>>> Task Status of Volume video-backup
>>>> ------------------------------------------------------------------------------
>>>> Task                 : Remove brick
>>>> ID                   : f666a196-03c2-4940-bd38-45d8383345a4
>>>> Removed bricks:
>>>> 10.0.0.43:/export/md1/brick
>>>> Status               : in progress
>>>>
>>>>
>>>> But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick
>>>>
>>>>
>>>> .....
>>>> [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9*
>>>> ....
>>>>  29: volume video-backup-client-2
>>>>  30:     type protocol/client
>>>>  31:     option clnt-lk-version 1
>>>>  32:     option volfile-checksum 0
>>>>  33:     option volfile-key rebalance/video-backup
>>>>  34:     option client-version 3.8.15
>>>>  35:     option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0
>>>>  36:     option fops-version 1298437
>>>>  37:     option ping-timeout 42
>>>>  38:     option remote-host 10.0.0.43
>>>>  39:     option remote-subvolume /export/md0/brick
>>>>  40:     option transport-type socket
>>>>  41:     option transport.address-family inet
>>>>  42:     option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
>>>>  43:     option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
>>>>  44: end-volume
>>>> ...
>>>> 112: volume video-backup-client-9
>>>> 113:     type protocol/client
>>>> 114:     option ping-timeout 42
>>>> 115:     option remote-host 10.0.0.43
>>>> 116:     option remote-subvolume /export/md1/brick
>>>> 117:     option transport-type socket
>>>> 118:     option transport.address-family inet
>>>> 119:     option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7
>>>> 120:     option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2
>>>> 121: end-volume
>>>> ...
>>>> [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106
>>>> [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /
>>>> [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2
>>>> [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9
>>>> ...
>>>> [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /
>>>> [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6
>>>> [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6
>>>> [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs
>>>> [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017
>>>> [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017
>>>> [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4
>>>> [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs
>>>> [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4
>>>>
>>>> etc...
>>>>
>>>> Can I stop/cancel it without data loss? How can I make gluster remove the correct brick?
>>>>
>>>> Thanks
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>>

-- 

Dr Stephen Remde
Director, Innovation and Research


T: 01535 280066
M: 07764 740920
E: stephen.remde at gaist.co.uk
W: www.gaist.co.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181218/db86fb07/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1734 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181218/db86fb07/attachment.jpg>


More information about the Gluster-users mailing list