[Gluster-users] gluster remove-brick
mohammad kashif
kashif.alig at gmail.com
Mon Feb 4 13:23:03 UTC 2019
Hi Nithya
I tried attching the logs but it was tool big. So I have put it on one
drive accessible by everyone
https://drive.google.com/drive/folders/1744WcOfrqe_e3lRPxLpQ-CBuXHp_o44T?usp=sharing
I am attaching rebalance-logs which is for the period when I ran fix-layout
after adding new disk and then started remove-disk option.
All of the nodes have atleast 8 TB disk available
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick001
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick002
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick003
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick004
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick005
/dev/sdb 80T 67T 14T 83% /glusteratlas/brick006
/dev/sdb 37T 1.6T 35T 5% /glusteratlas/brick007
/dev/sdb 89T 15T 75T 17% /glusteratlas/brick008
/dev/sdb 89T 14T 76T 16% /glusteratlas/brick009
brick007 is the one I am removing
gluster volume info
Volume Name: atlasglust
Type: Distribute
Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: pplxgluster01**:/glusteratlas/brick001/gv0
Brick2: pplxgluster02.**:/glusteratlas/brick002/gv0
Brick3: pplxgluster03.**:/glusteratlas/brick003/gv0
Brick4: pplxgluster04.**:/glusteratlas/brick004/gv0
Brick5: pplxgluster05.**:/glusteratlas/brick005/gv0
Brick6: pplxgluster06.**:/glusteratlas/brick006/gv0
Brick7: pplxgluster07.**:/glusteratlas/brick007/gv0
Brick8: pplxgluster08.**:/glusteratlas/brick008/gv0
Brick9: pplxgluster09.**:/glusteratlas/brick009/gv0
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
auth.allow: ***
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.md-cache-timeout: 600
performance.parallel-readdir: off
performance.cache-size: 1GB
performance.client-io-threads: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.cache-invalidation: on
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
Thanks
On Mon, Feb 4, 2019 at 11:37 AM Nithya Balachandran <nbalacha at redhat.com>
wrote:
> Hi,
>
>
> On Mon, 4 Feb 2019 at 16:39, mohammad kashif <kashif.alig at gmail.com>
> wrote:
>
>> Hi Nithya
>>
>> Thanks for replying so quickly. It is very much appreciated.
>>
>> There are lots if " [No space left on device] " errors which I can not
>> understand as there are much space on all of the nodes.
>>
>
> This means that Gluster could not find sufficient space for the file.
> Would you be willing to share your rebalance log file?
> Please provide the following information:
>
> - The gluster version
> - The gluster volume info for the volume
> - How full are the individual bricks for the volume?
>
>
>
>> A little bit of background will be useful in this case. I had cluster of
>> seven nodes of varying capacity(73, 73, 73, 46, 46, 46,46 TB) . The
>> cluster was almost 90% full so every node has almost 8 to 15 TB free
>> space. I added two new nodes with 100TB each and ran fix-layout which
>> completed successfully.
>>
>> After that I started remove-brick operation. I don't think that any
>> point , any of the nodes were 100% full. Looking at my ganglia graph, there
>> is minimum 5TB always available at every node.
>>
>> I was keeping an eye on remove-brick status and for very long time there
>> was no failures and then at some point these 17000 failures appeared and it
>> stayed like that.
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>>
>>
>> Let me explain a little bit of background.
>>
>>
>> On Mon, Feb 4, 2019 at 5:09 AM Nithya Balachandran <nbalacha at redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The status shows quite a few failures. Please check the rebalance logs
>>> to see why that happened. We can decide what to do based on the errors.
>>> Once you run a commit, the brick will no longer be part of the volume
>>> and you will not be able to access those files via the client.
>>> Do you have sufficient space on the remaining bricks for the files on
>>> the removed brick?
>>>
>>> Regards,
>>> Nithya
>>>
>>> On Mon, 4 Feb 2019 at 03:50, mohammad kashif <kashif.alig at gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have a pure distributed gluster volume with nine nodes and trying to
>>>> remove one node, I ran
>>>> gluster volume remove-brick atlasglust
>>>> nodename:/glusteratlas/brick007/gv0 start
>>>>
>>>> It completed but with around 17000 failures
>>>>
>>>> Node Rebalanced-files size scanned failures
>>>> skipped status run time in h:m:s
>>>> --------- -----------
>>>> ----------- ----------- ----------- -----------
>>>> ------------ --------------
>>>> nodename 4185858 27.5TB 6746030
>>>> 17488 0 completed 405:15:34
>>>>
>>>> I can see that there is still 1.5 TB of data on the node which I was
>>>> trying to remove.
>>>>
>>>> I am not sure what to do now? Should I run remove-brick command again
>>>> so the files which has been failed can be tried again?
>>>>
>>>> or should I run commit first and then try to remove node again?
>>>>
>>>> Please advise as I don't want to remove files.
>>>>
>>>> Thanks
>>>>
>>>> Kashif
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/4f7dfd17/attachment-0001.html>
More information about the Gluster-users
mailing list