[Gluster-users] Distributed re-balance issue

Wed May 24 14:32:28 UTC 2017

On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's
> hosting large qcow2 VM disks (I know it's reliable but it's
> not important data)
>
> I started with 5 bricks and then added another one, started the re
> balance process, everything went well, but now im looking at the
> bricks free space and i found one brick is around 82% while others
> ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than
> other bricks, and whenever i start re balance it just complete in 0
> seconds and without moving any data.
>

How much is your average file size in the cluster? And number of files
(roughly) .

> What will happen with the brick became full ?
>
Once brick contents goes beyond 90%, new files won't be created in the
brick. But existing files can grow.

> Can i move data manually from one brick to the other ?
>

Nop.It is not recommended, even though gluster will try to find the
file, it may break.

> Why re balance not distributing data evenly on all bricks ?
>

Rebalance works based on layout, so we need to see how layouts are
distributed. If one of your bricks has higher capacity, it will have
larger layout.

>
> Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
>
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
> took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> took 0.00 secs
> [2017-05-23 14:45:12.653291] I
> [dht-rebalance.c:3838:gf_defrag_start_crawl] 0-DHT: crawling
> file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
> [2017-05-23 14:45:12.653659] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 31
> [2017-05-23 14:45:12.653677] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 32
> [2017-05-23 14:45:12.653692] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 33
> [2017-05-23 14:45:12.653711] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 34
> [2017-05-23 14:45:12.653723] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 35
> [2017-05-23 14:45:12.653739] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 36
> [2017-05-23 14:45:12.653759] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 37
> [2017-05-23 14:45:12.653772] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 38
> [2017-05-23 14:45:12.653789] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 39
> [2017-05-23 14:45:12.653800] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 40
> [2017-05-23 14:45:12.653811] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 41
> [2017-05-23 14:45:12.653822] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 42
> [2017-05-23 14:45:12.653836] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 43
> [2017-05-23 14:45:12.653870] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 44
> [2017-05-23 14:45:12.654413] I [MSGID: 109028]
> [dht-rebalance.c:4079:gf_defrag_status_get] 0-ctvvols-dht: Rebalance
> is completed. Time taken is 0.00 secs
> [2017-05-23 14:45:12.654428] I [MSGID: 109028]
> [dht-rebalance.c:4083:gf_defrag_status_get] 0-ctvvols-dht: Files
> migrated: 0, size: 0, lookups: 15, failures: 0, skipped: 0
> [2017-05-23 14:45:12.654552] W [glusterfsd.c:1327:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff40ff88dc5]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff41161acd5]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7ff41161ab4b] ) 0-:
> received signum (15), shutting down
>
>
>
> Appreciate your help
>
>
>
> -- 
>
> Respectfully*
> **Mahdi A. Mahdi*
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170524/fc9404e6/attachment.html>