[Gluster-users] gluster remove-brick
kashif.alig at gmail.com
Mon Feb 4 13:23:03 UTC 2019
I tried attching the logs but it was tool big. So I have put it on one
drive accessible by everyone
I am attaching rebalance-logs which is for the period when I ran fix-layout
after adding new disk and then started remove-disk option.
All of the nodes have atleast 8 TB disk available
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick001
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick002
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick003
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick004
/dev/sdb 73T 65T 8.0T 90% /glusteratlas/brick005
/dev/sdb 80T 67T 14T 83% /glusteratlas/brick006
/dev/sdb 37T 1.6T 35T 5% /glusteratlas/brick007
/dev/sdb 89T 15T 75T 17% /glusteratlas/brick008
/dev/sdb 89T 14T 76T 16% /glusteratlas/brick009
brick007 is the one I am removing
gluster volume info
Volume Name: atlasglust
Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
Snapshot Count: 0
Number of Bricks: 9
On Mon, Feb 4, 2019 at 11:37 AM Nithya Balachandran <nbalacha at redhat.com>
> On Mon, 4 Feb 2019 at 16:39, mohammad kashif <kashif.alig at gmail.com>
>> Hi Nithya
>> Thanks for replying so quickly. It is very much appreciated.
>> There are lots if " [No space left on device] " errors which I can not
>> understand as there are much space on all of the nodes.
> This means that Gluster could not find sufficient space for the file.
> Would you be willing to share your rebalance log file?
> Please provide the following information:
> - The gluster version
> - The gluster volume info for the volume
> - How full are the individual bricks for the volume?
>> A little bit of background will be useful in this case. I had cluster of
>> seven nodes of varying capacity(73, 73, 73, 46, 46, 46,46 TB) . The
>> cluster was almost 90% full so every node has almost 8 to 15 TB free
>> space. I added two new nodes with 100TB each and ran fix-layout which
>> completed successfully.
>> After that I started remove-brick operation. I don't think that any
>> point , any of the nodes were 100% full. Looking at my ganglia graph, there
>> is minimum 5TB always available at every node.
>> I was keeping an eye on remove-brick status and for very long time there
>> was no failures and then at some point these 17000 failures appeared and it
>> stayed like that.
>> Let me explain a little bit of background.
>> On Mon, Feb 4, 2019 at 5:09 AM Nithya Balachandran <nbalacha at redhat.com>
>>> The status shows quite a few failures. Please check the rebalance logs
>>> to see why that happened. We can decide what to do based on the errors.
>>> Once you run a commit, the brick will no longer be part of the volume
>>> and you will not be able to access those files via the client.
>>> Do you have sufficient space on the remaining bricks for the files on
>>> the removed brick?
>>> On Mon, 4 Feb 2019 at 03:50, mohammad kashif <kashif.alig at gmail.com>
>>>> I have a pure distributed gluster volume with nine nodes and trying to
>>>> remove one node, I ran
>>>> gluster volume remove-brick atlasglust
>>>> nodename:/glusteratlas/brick007/gv0 start
>>>> It completed but with around 17000 failures
>>>> Node Rebalanced-files size scanned failures
>>>> skipped status run time in h:m:s
>>>> --------- -----------
>>>> ----------- ----------- ----------- -----------
>>>> ------------ --------------
>>>> nodename 4185858 27.5TB 6746030
>>>> 17488 0 completed 405:15:34
>>>> I can see that there is still 1.5 TB of data on the node which I was
>>>> trying to remove.
>>>> I am not sure what to do now? Should I run remove-brick command again
>>>> so the files which has been failed can be tried again?
>>>> or should I run commit first and then try to remove node again?
>>>> Please advise as I don't want to remove files.
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users