[Gluster-users] Advice on moving volumes/bricks to new servers

Sun Mar 1 23:25:13 UTC 2020

Strahil Nikolov wrote on 01/03/2020 07:12:
> On March 1, 2020 2:02:33 AM GMT+02:00, Ronny Adsetts <ronny.adsetts at amazinginternet.com> wrote:
>>
>> I have a 4-server system running a distributed-replicate setup, 4 x (2
>> + 1) = 12. Bricks are staggered across the servers. Sharding is
>> enabled. (v info shown below)
>>
>> Now, the storage is slow on the these servers and not really up to the
>> job so we have 4 new servers with SSDs. I have to move everything over
>> to the new servers whilst not taking down the storage.
>>
>> The four old servers are running Gluster 6.4 and the new ones, 6.5.
>>
>> So having read tons of docs and mailing lists, etc, I think I ought to
>> be able to use add-brick, remove-brick to get everything moved safely
>> like so:
>>
>> # gluster volume add-brick iscsi replica 3 arbiter 1
>> srv{13..15}:/brick1
>>
>> # gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 start
>>
>> Then once complete, do:
>>
>> # gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 commit
>>
>>
>> So I created a test volume to try this out. On the third add/remove of
>> 4, I get a 'failed' on the remove-brick status. The rebalance log
>> shows:
>>
>> [2020-02-28 22:25:28.133902] I [dht-rebalance.c:1589:dht_migrate_file]
>> 0-testmig
>> rate-dht:
>> /linux-5.4.22/arch/arm/boot/dts/exynos4412-itop-scp-core.dtsi: attempt
>> ing to move from testmigrate-replicate-0 to testmigrate-replicate-2
>> [2020-02-28 22:25:28.144258] W [MSGID: 108015]
>> [afr-self-heal-name.c:138:__afr_s
>> elfheal_name_expunge] 0-testmigrate-replicate-0: expunging file
>> a75a83b7-2c34-40
>> 77-b4fc-3126a9d6058a/exynos4210-smdkv310.dts
>> (11a47b1f-2c24-4d4b-9402-9130125cf9
>> 53) on testmigrate-client-6
>> [2020-02-28 22:25:28.146321] E [MSGID: 109023]
>> [dht-rebalance.c:1707:dht_migrate_file] 0-testmigrate-dht: Migrate file
>> failed:/linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts: lookup
>> failed on testmigrate-replicate-0 [No such file or directory]
>> [2020-02-28 22:25:28.149104] E [MSGID: 109023]
>> [dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-testmigrate-dht:
>> migrate-data failed for
>> /linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts [No such file
>> or directory]
>>
>>
>> This is show for 4 files.
>>
>> When I look at the FUSE-mounted volume, the file is there and correct
>> but the file permissions of this and lots of others are screwed. Lots
>> of dirs with d--------- permissions, lots of root:root owned files.
>>
>>
>> So any advice for how to proceed from here:
>>
>>
>> I did a force on the remove-brick as the data seemed to be in place
>> which is fine, but now I can't do an add-brick as gluster seems to
>> think a rebalance is taking place:
>>
>> ---
>> volume add-brick: failed: Pre Validation failed on
>> terek-stor.amazing-internet.net. Volume name testmigrate rebalance is
>> in progress. Please retry after completion
>> ---
>>
>> $ sudo gluster volume rebalance testmigrate status               
>> volume rebalance: testmigrate: failed: Rebalance not started for volume
>> testmigrate.
>>
>> Thanks for any insight anyone can offer.
>>
>> Ronny
>>
>>
>>
>>
>>
>> $ sudo gluster volume info iscsi
>>
>> Volume Name: iscsi
>> Type: Distributed-Replicate
>> Volume ID: 40ff42a7-5dee-4a98-991b-c4ba5bc50438
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 4 x (2 + 1) = 12
>> Transport-type: tcp
>> Bricks:
>> Brick1:
>> ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick
>> Brick2:
>> mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick
>> Brick3:
>> terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1a/brick
>> (arbiter)
>> Brick4:
>> walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick
>> Brick5:
>> ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick
>> Brick6:
>> mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2a/brick
>> (arbiter)
>> Brick7:
>> terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick
>> Brick8:
>> walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick
>> Brick9:
>> ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3a/brick
>> (arbiter)
>> Brick10:
>> mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick
>> Brick11:
>> terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick
>> Brick12:
>> walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4a/brick
>> (arbiter)
>> Options Reconfigured:
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.open-behind: off
>> performance.readdir-ahead: off
>> performance.strict-o-direct: on
>> network.remote-dio: disable
>> cluster.eager-lock: enable
>> cluster.quorum-type: auto
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: on
>> features.shard-block-size: 64MB
>> user.cifs: off
>> server.allow-insecure: on
>> cluster.choose-local: off
>> auth.allow: 127.0.0.1,172.16.36.*,172.16.40.*
>> ssl.cipher-list: HIGH:!SSLv2
>> server.ssl: on
>> client.ssl: on
>> ssl.certificate-depth: 1
>> performance.cache-size: 1GB
>> client.event-threads: 4
>> server.event-threads: 4
> 
> Have you checked the brick logs of 'testmigrate-replicate-0' which should be your srv1  ?
> 
> Maybe there were some pending heals at that time and the brick didn't have the necessary data.

Hi Strahil,

Thanks for your reply.

There shouldn't have been any pending heals though admittedly I didn't check. The volume wasn't really in use, just some files on it for testing data integrity after the brick moves.

The only errors in the brick log for brick 1 at that point are:

[2020-02-28 22:25:51.577926] E [inodelk.c:513:__inode_unlock_lock] 0-testmigrate-locks:  Matching lock not found for unlock 0-9223372036854775807, by fdffffff on 0x7fc77c523740
[2020-02-28 22:25:51.577972] E [MSGID: 115053] [server-rpc-fops_v2.c:279:server4_inodelk_cbk] 0-testmigrate-server: 10: INODELK <gfid:44363cd1-fe2f-4f7e-99bd-5f6f20b61a30> (44363cd1-fe2f-4f7e-99bd-5f6f20b61a30), client: CTX_ID:9cd0075c-f181-4e3f-8b0c-6d543ce1075f-GRAPH_ID:0-PID:30477-HOST:walker.amazing-internet.net-PC_NAME:testmigrate-client-6-RECON_NO:-0, error-xlator: testmigrate-locks [Invalid argument]

I don't fully grock that but it doesn't seem related based on my limited knowledge.

Looking in the rebalance log on that server, we're seeing lots of this:

[2020-03-01 23:15:45.072749] E [MSGID: 108006] [afr-common.c:5318:__afr_handle_child_down_event] 0-testmigrate-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.

So it seems that one node really isn't happy that the brick is down. The "remove-brick force" was probably the wrong thing to do with the split-brain files still showing.

> Another way to migrate the data is to:
> 1. Add the new disks on the old srv1,2,3  
> 2. Add the new disks to the VG
> 3. pvmove all LVs to the new disks (I prefer to use the '--atomic' option)
> 4. vgreduce with the old disks
> 5. pvremove the old disks
> 6. Then just delete the block devices from the kernel and remove them physically
> 
> Of course, this requires 'hotplugging' and available slots on the systems.
> 
> Or you can stop 1 gluster node (no pending heals) ,  remove the old disks -> swap with new.
> Then power up.
> Create the VG/LV and mount on the same place.
> Then you can just 'replace-brick' or 'reset-brick' and gluster will heal the data.
> Repeat for the other 3 nodes and you will be ready.

Anything involving physical access isn't really practical for various reasons. Nice ideas though.

At this stage, I'm just a little nervous about moving the production data after tripping up on the test migration. The two gluster volumes have a) vm disks and b) iscsi device backing files on them. And of course there's a deadline. :-).

Ronny
-- 
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200301/22f59df5/attachment.sig>