[Gluster-users] Distributed re-balance issue

Nithya Balachandran nbalacha at redhat.com
Wed May 24 17:16:53 UTC 2017


On 24 May 2017 at 22:45, Nithya Balachandran <nbalacha at redhat.com> wrote:

>
>
> On 24 May 2017 at 21:55, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:
>
>> Hi,
>>
>>
>> Thank you for your response.
>>
>> I have around 15 files, each is 2TB qcow.
>>
>> One brick reached 96% so i removed it with "brick remove" and waited
>> until it goes for around 40% and stopped the removal process with brick
>> remove stop.
>>
>> The issue is brick1 drain it's data to brick6 only, and when brick6
>> reached around 90% i did the same thing as before and it drained the data
>> to brick1 only.
>>
>> now brick6 reached 99% and i have only a few gigabytes left which will
>> fill in the next half hour or so.
>>
>> attached are the logs for all 6 bricks.
>>
>> Hi,
>
> Just to clarify, did you run a rebalance (gluster volume rebalance <vol>
> start) or did you only run remove-brick  ?
>
> On re-reading your original email, I see you did run a rebalance. Did it
complete? Also which bricks are full at the moment?


>
> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Nithya Balachandran <nbalacha at redhat.com>
>> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
>> *To:* Mohammed Rafi K C
>> *Cc:* Mahdi Adnan; gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>>
>>
>>
>> On 24 May 2017 at 20:02, Mohammed Rafi K C <rkavunga at redhat.com> wrote:
>>
>>>
>>>
>>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>>
>>> Hi,
>>>
>>>
>>> I have a distributed volume with 6 bricks, each have 5TB and it's
>>> hosting large qcow2 VM disks (I know it's reliable but it's not important
>>> data)
>>>
>>> I started with 5 bricks and then added another one, started the re
>>> balance process, everything went well, but now im looking at the bricks
>>> free space and i found one brick is around 82% while others ranging from
>>> 20% to 60%.
>>>
>>> The brick with highest utilization is hosting more qcow2 disk than other
>>> bricks, and whenever i start re balance it just complete in 0 seconds and
>>> without moving any data.
>>>
>>>
>>> How much is your average file size in the cluster? And number of files
>>> (roughly) .
>>>
>>>
>>> What will happen with the brick became full ?
>>>
>>> Once brick contents goes beyond 90%, new files won't be created in the
>>> brick. But existing files can grow.
>>>
>>>
>>> Can i move data manually from one brick to the other ?
>>>
>>>
>>> Nop.It is not recommended, even though gluster will try to find the
>>> file, it may break.
>>>
>>>
>>> Why re balance not distributing data evenly on all bricks ?
>>>
>>>
>>> Rebalance works based on layout, so we need to see how layouts are
>>> distributed. If one of your bricks has higher capacity, it will have larger
>>> layout.
>>>
>>>
>>
>>
>>> That is correct. As Rafi said, the layout matters here. Can you please
>>> send across all the rebalance logs from all the 6 nodes?
>>>
>>>
>> Nodes runing CentOS 7.3
>>>
>>> Gluster 3.8.11
>>>
>>>
>>> Volume info;
>>> Volume Name: ctvvols
>>> Type: Distribute
>>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ctv01:/vols/ctvvols
>>> Brick2: ctv02:/vols/ctvvols
>>> Brick3: ctv03:/vols/ctvvols
>>> Brick4: ctv04:/vols/ctvvols
>>> Brick5: ctv05:/vols/ctvvols
>>> Brick6: ctv06:/vols/ctvvols
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: none
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10000
>>> features.shard: off
>>> user.cifs: off
>>> network.ping-timeout: 10
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>>
>>>
>>> re balance log:
>>>
>>>
>>> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
>>> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
>>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4
>>> 206-848a-d73e85a1cc35
>>> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
>>> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
>>> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
>>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4
>>> d90-abf5-de757dd04078
>>> [2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>>> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
>>> [2017-05-23 14:45:12.647589] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>>> a7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078 took 0.00 secs
>>> [2017-05-23 14:45:12.653291] I [dht-rebalance.c:3838:gf_defrag_start_crawl]
>>> 0-DHT: crawling file-system completed
>>> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
>>> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
>>> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
>>> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
>>> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
>>> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
>>> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
>>> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
>>> [2017-05-23 14:45:12.653659] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 31
>>> [2017-05-23 14:45:12.653677] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 32
>>> [2017-05-23 14:45:12.653692] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 33
>>> [2017-05-23 14:45:12.653711] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 34
>>> [2017-05-23 14:45:12.653723] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 35
>>> [2017-05-23 14:45:12.653739] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 36
>>> [2017-05-23 14:45:12.653759] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 37
>>> [2017-05-23 14:45:12.653772] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 38
>>> [2017-05-23 14:45:12.653789] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 39
>>> [2017-05-23 14:45:12.653800] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 40
>>> [2017-05-23 14:45:12.653811] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 41
>>> [2017-05-23 14:45:12.653822] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 42
>>> [2017-05-23 14:45:12.653836] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 43
>>> [2017-05-23 14:45:12.653870] I [dht-rebalance.c:2246:gf_defrag_task]
>>> 0-DHT: Thread wokeup. defrag->current_thread_count: 44
>>> [2017-05-23 14:45:12.654413] I [MSGID: 109028]
>>> [dht-rebalance.c:4079:gf_defrag_status_get] 0-ctvvols-dht: Rebalance is
>>> completed. Time taken is 0.00 secs
>>> [2017-05-23 14:45:12.654428] I [MSGID: 109028]
>>> [dht-rebalance.c:4083:gf_defrag_status_get] 0-ctvvols-dht: Files
>>> migrated: 0, size: 0, lookups: 15, failures: 0, skipped: 0
>>> [2017-05-23 14:45:12.654552] W [glusterfsd.c:1327:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff40ff88dc5]
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff41161acd5]
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7ff41161ab4b] ) 0-:
>>> received signum (15), shutting down
>>>
>>>
>>>
>>> Appreciate your help
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170524/98ee6994/attachment.html>


More information about the Gluster-users mailing list