[Gluster-users] Distributed re-balance issue

Wed May 24 17:15:18 UTC 2017

On 24 May 2017 at 21:55, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:

> Hi,
>
>
> Thank you for your response.
>
> I have around 15 files, each is 2TB qcow.
>
> One brick reached 96% so i removed it with "brick remove" and waited until
> it goes for around 40% and stopped the removal process with brick remove
> stop.
>
> The issue is brick1 drain it's data to brick6 only, and when brick6
> reached around 90% i did the same thing as before and it drained the data
> to brick1 only.
>
> now brick6 reached 99% and i have only a few gigabytes left which will
> fill in the next half hour or so.
>
> attached are the logs for all 6 bricks.
>
> Hi,

Just to clarify, did you run a rebalance (gluster volume rebalance <vol>
start) or did you only run remove-brick  ?

-- 
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Nithya Balachandran <nbalacha at redhat.com>
> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
> *To:* Mohammed Rafi K C
> *Cc:* Mahdi Adnan; gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>
>
>
> On 24 May 2017 at 20:02, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>
>>
>>
>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>
>> Hi,
>>
>>
>> I have a distributed volume with 6 bricks, each have 5TB and it's hosting
>> large qcow2 VM disks (I know it's reliable but it's not important data)
>>
>> I started with 5 bricks and then added another one, started the re
>> balance process, everything went well, but now im looking at the bricks
>> free space and i found one brick is around 82% while others ranging from
>> 20% to 60%.
>>
>> The brick with highest utilization is hosting more qcow2 disk than other
>> bricks, and whenever i start re balance it just complete in 0 seconds and
>> without moving any data.
>>
>>
>> How much is your average file size in the cluster? And number of files
>> (roughly) .
>>
>>
>> What will happen with the brick became full ?
>>
>> Once brick contents goes beyond 90%, new files won't be created in the
>> brick. But existing files can grow.
>>
>>
>> Can i move data manually from one brick to the other ?
>>
>>
>> Nop.It is not recommended, even though gluster will try to find the file,
>> it may break.
>>
>>
>> Why re balance not distributing data evenly on all bricks ?
>>
>>
>> Rebalance works based on layout, so we need to see how layouts are
>> distributed. If one of your bricks has higher capacity, it will have larger
>> layout.
>>
>>
>
>
>> That is correct. As Rafi said, the layout matters here. Can you please
>> send across all the rebalance logs from all the 6 nodes?
>>
>>
> Nodes runing CentOS 7.3
>>
>> Gluster 3.8.11
>>
>>
>> Volume info;
>> Volume Name: ctvvols
>> Type: Distribute
>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: ctv01:/vols/ctvvols
>> Brick2: ctv02:/vols/ctvvols
>> Brick3: ctv03:/vols/ctvvols
>> Brick4: ctv04:/vols/ctvvols
>> Brick5: ctv05:/vols/ctvvols
>> Brick6: ctv06:/vols/ctvvols
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.low-prio-threads: 32
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> cluster.quorum-type: none
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: off
>> user.cifs: off
>> network.ping-timeout: 10
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>>
>>
>> re balance log:
>>
>>
>> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
>> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-
>> 4206-848a-d73e85a1cc35
>> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
>> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
>> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-
>> 4d90-abf5-de757dd04078
>> [2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
>> [2017-05-23 14:45:12.647589] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078 took 0.00 secs
>> [2017-05-23 14:45:12.653291] I [dht-rebalance.c:3838:gf_defrag_start_crawl]
>> 0-DHT: crawling file-system completed
>> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
>> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
>> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
>> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
>> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
>> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
>> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
>> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
>> [2017-05-23 14:45:12.653659] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 31
>> [2017-05-23 14:45:12.653677] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 32
>> [2017-05-23 14:45:12.653692] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 33
>> [2017-05-23 14:45:12.653711] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 34
>> [2017-05-23 14:45:12.653723] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 35
>> [2017-05-23 14:45:12.653739] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 36
>> [2017-05-23 14:45:12.653759] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 37
>> [2017-05-23 14:45:12.653772] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 38
>> [2017-05-23 14:45:12.653789] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 39
>> [2017-05-23 14:45:12.653800] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 40
>> [2017-05-23 14:45:12.653811] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 41
>> [2017-05-23 14:45:12.653822] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 42
>> [2017-05-23 14:45:12.653836] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 43
>> [2017-05-23 14:45:12.653870] I [dht-rebalance.c:2246:gf_defrag_task]
>> 0-DHT: Thread wokeup. defrag->current_thread_count: 44
>> [2017-05-23 14:45:12.654413] I [MSGID: 109028]
>> [dht-rebalance.c:4079:gf_defrag_status_get] 0-ctvvols-dht: Rebalance is
>> completed. Time taken is 0.00 secs
>> [2017-05-23 14:45:12.654428] I [MSGID: 109028]
>> [dht-rebalance.c:4083:gf_defrag_status_get] 0-ctvvols-dht: Files
>> migrated: 0, size: 0, lookups: 15, failures: 0, skipped: 0
>> [2017-05-23 14:45:12.654552] W [glusterfsd.c:1327:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff40ff88dc5]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff41161acd5]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7ff41161ab4b] ) 0-:
>> received signum (15), shutting down
>>
>>
>>
>> Appreciate your help
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170524/6bf3bd6b/attachment.html>