[heketi-devel] Remove Device: Used to distribute all the bricks from device to other devices

Raghavendra Talur rtalur at redhat.com
Thu Mar 9 16:06:33 UTC 2017


The PR now has the API changes requested.

Before review, here is the current state diagram



+-----------------+     disable/offline  +------------------+
|                 |--------------------->|                  |
|                 |                      | Offline/Disabled |
| Online/Enabled  |<---------------------|                  |
|                 |      enable/online   |                  |
+-----------------+                      +------------------+
      ^                                          ^        |
      |                                          |        |remove
      |                                   offline|        |
      |add                                       |        |
      |                                          |        |
      |                                          |        |
      |                                          |        v
+------------------+                     +-------------------+
|                  |                     |                   |
| Deleted          |                     |   Failed/Removed  |
|                  |<--------------------|                   |
|                  |          delete     |                   |
+------------------+                     +-------------------+

​

The current implementation *requires* the device to be in "Offline" state
before it can be removed. Some of the operations shown above aren't
implemented yet.
Is this acceptable to all ? Are there any concerns or suggestions?

Thanks,
Raghavendra Talur


On Thu, Mar 9, 2017 at 10:43 AM, Luis Pabon <lpabon at chrysalix.org> wrote:
> Awesome.  I'll definitely review tomorrow.
>
> - Luis
>
> On Wed, Mar 8, 2017 at 7:59 PM, Raghavendra Talur <rtalur at redhat.com>
wrote:
>>
>> Hi Luis,
>>
>> Please have a look at PR 710 which has changes that you requested.
>>
>> I have followed the revert of revert model for merge commits as
>> suggested by Linus in
>>
>>
https://raw.githubusercontent.com/git/git/master/Documentation/howto/revert-a-faulty-merge.txt
>> for create a new PR.
>>
>> If you prefer it to be in any other way, please let us know.
>>
>> Also, these changes don't have API+Async changes and Refactored code
>> from allocator.
>> I will send them in a few hours. Meanwhile I wanted to put the simpler
>> stuff out for review.
>>
>> Thanks,
>> Raghavendra Talur
>>
>> On Wed, Feb 22, 2017 at 2:01 PM, Mohamed Ashiq Liyazudeen
>> <mliyazud at redhat.com> wrote:
>> > Hi,
>> >
>> > New commit addresses all the comments. Please Review and comment on the
>> > PR.
>> >
>> > Prerequisites, Done:
>> > We now added VolumeId in BrickEntry and VolumeInfo Executor call which
>> > will
>> > return Whole information of volume from gluster Itself(instead of
saving
>> > the
>> > brick peer(brickset), we generate the brick peers from this
>> > information).
>> >
>> >
>> > How does this work:
>> >
>> > For a Device to be remove.
>> > First If the Device is Empty then Return ok to remove.
>> > Else
>> > Get the bricklist for bricks in device to be removed and its
appropriate
>> > volumeEntrylist for bricks.
>> > Call Replace brick for a volume with the brickId.
>> >
>> >
>> > In Replace Brick Logic:
>> > 1)First we Find the BrickSet(a set in which brick belongs, For Example
>> > in
>> > Distribute-Replicate 2x3 [A(A1,A2,A3), B(B1,B2,B3)], B2 is on set B) in
>> > which the brick to be replaced is present.
>> > Reason to find this is we should not place the brick with another brick
>> > of
>> > same set(which will cause Quorum to be met if one node is down and also
>> > not
>> > a good design).
>> > 2) Call the allocator to give out devices for the same cluster.
>> > 3)Ignore the Device IF:
>> > a)Same Device to be removed
>> > b)Device belongs to same Node where one of the other bricks in Set is
>> > present
>> > 4) With above logic We can still use the logic of simpleAllocator ring
>> > to
>> > decide the brick placement with single Zone and Multiple zones.
>> > 5) On Failure returns Err and In case of NoSpaceError, We Respond
>> > Replacementnotfound.
>> >
>> >
>> > Note:
>> > Few basic tests added for New VolumeId for BrickEntry and all the
>> > failure
>> > based on executor.SimpleVolumeInfo change from executor.VolumeInfo has
>> > been
>> > fixed.
>> > Kept Device Remove modular so that can be used for Node Remove.
>> >
>> >
>> > To Be Done:
>> > Tests to be Added.
>> >
>> >
>> > [1] https://github.com/heketi/heketi/pull/676
>> >
>> > -- Ashiq,Talur
>> > ________________________________
>> > From: "Luis Pabon" <lpabon at chrysalix.org>
>> > To: "Mohamed Ashiq Liyazudeen" <mliyazud at redhat.com>
>> > Cc: heketi-devel at gluster.org
>> > Sent: Friday, February 17, 2017 1:49:32 AM
>> >
>> > Subject: Re: [heketi-devel] Remove Device: Used to distribute all the
>> > bricks
>> > from device to other devices
>> >
>> > FYI, unless by some miracle there is no way this feature will be in by
>> > Sunday.  This feature is one of the hardest part of Heketi which is why
>> > https://github.com/heketi/heketi/issues/161 has taken so long.
>> >
>> > The brick set is the heart of this change.  A brick set is how Heketi
>> > sets
>> > up the replicas in a ring.  For example: in a distributed replicated
>> > 2x3,
>> > brick A would need A1 and A2 as replicas.  Therefore, A,A1,A2 are a
set.
>> > Same applies for B,B1,B2.
>> >
>> > Replacing a device which contains B1 (for example), would need a
>> > replacement
>> > brick which satisfies B and B2 for the set to be complete.  Same thing
>> > applies for EC where it is A,A1...A(n).
>> >
>> > This is a big change, which requires a good algorithm, execution, and
>> > testing.
>> >
>> > - Luis
>> >
>> > On Thu, Feb 16, 2017 at 2:25 PM, Mohamed Ashiq Liyazudeen
>> > <mliyazud at redhat.com> wrote:
>> >>
>> >> Hi Luis,
>> >>
>> >> I agree on adding the VolumeId part to db for bricks. I didn't get
what
>> >> you mean by brick peers?
>> >>
>> >> I wanted to know better about the allocator behaviors based on number
>> >> of
>> >> zones. If you see our example topology file, It has 4 nodes with
>> >> multiple
>> >> devices but 2 nodes are associated to a zone. There are only two zones
>> >> now
>> >> and while creating replica three volume how is the allocator creates
>> >> ring of
>> >> devices? Mainly in this case we can not ignore both zones.
>> >>
>> >> Also wanted to know in case of volume expand how are we approaching. I
>> >> thought it will be using something similar to give the state(where the
>> >> present brick are) of existing volume  to allocator and allocator will
>> >> give
>> >> back ring without those zones or nodes. But I think (correct me if I
am
>> >> wrong) Volume is changed by adding appropriate bricks, In the sense
>> >> replica
>> >> 3(3x1) is added bricks and made distribute replica 3(3x2). I agree
this
>> >> is
>> >> the way to go, just trying to understand allocator better.
>> >>
>> >> We need this feature to be in by Sunday. I will be working on it
>> >> mostly,
>> >> Will definitely mail but is there any place to chat with you in case
of
>> >> doubts and quick answers?
>> >>
>> >> Tomorrow as first thing will add the VolumeId and brick peers(not sure
>> >> what is it exactly).
>> >>
>> >> --
>> >> Ashiq
>> >>
>> >> ----- Original Message -----
>> >> From: "Luis Pabon" <lpabon at chrysalix.org>
>> >> To: "Mohamed Ashiq Liyazudeen" <mliyazud at redhat.com>
>> >> Cc: heketi-devel at gluster.org
>> >> Sent: Thursday, February 16, 2017 11:32:55 PM
>> >> Subject: Re: [heketi-devel] Remove Device: Used to distribute all the
>> >> bricks from device to other devices
>> >>
>> >> After we agree on the algorithm, the first PR would be to add the
>> >> necessary
>> >> framework to the DB to support #676.
>> >>
>> >> - Luis
>> >>
>> >> On Thu, Feb 16, 2017 at 1:00 PM, Luis Pabon <lpabon at chrysalix.org>
>> >> wrote:
>> >>
>> >> > Great summary.  Yes, the next step should be to figure out how to
>> >> > enhance
>> >> > the ring to return a brick for another zone.  It could be as simple
>> >> > as:
>> >> >
>> >> > If current bricks in set are in different zones:
>> >> >     Get a ring
>> >> >     Remove disks from the ring in zones already used
>> >> >     Return devices until one is found with the appropriate size
>> >> > else:
>> >> >    Get a ring
>> >> >    Return devices until one is found with the appropriate size
>> >> >
>> >> > Also, order of the disks may matter.  This part I am not sure of,
>> >> > but,
>> >> > we
>> >> > may need to make sure of the order of the bricks were added to the
>> >> > volume
>> >> > during 'create'.  This may be necessary to determine which of the
>> >> > bricks
>> >> > in
>> >> > the brick set are in different zones.
>> >> >
>> >> > We may have to add a new DB entry in the Brick Entry.  For example:
>> >> > Brick
>> >> > peers, and Volume ID
>> >> >
>> >> > - Luis
>> >> >
>> >> > On Wed, Feb 15, 2017 at 2:17 PM, Mohamed Ashiq Liyazudeen <
>> >> > mliyazud at redhat.com> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> This mail talks about the PR[1]
>> >> >>
>> >> >> Let me start off with what is planned to do in this.
>> >> >>
>> >> >> We only support this feature for Replicate and Distribute Replicate
>> >> >> Volume.
>> >> >> Refer: https://gluster.readthedocs.io/en/latest/Administrator%20Gui
>> >> >> de/Managing%20Volumes/#replace-brick
>> >> >>
>> >> >> Removes all the brick from the device and start these bricks on
>> >> >> other
>> >> >> devices based on allocator. Heal is triggered automatically for
>> >> >> replicate
>> >> >> volumes on replace brick. Allocate and create new brick to replace.
>> >> >> It
>> >> >> stops the brick to be replaced, If it is not already down(kill the
>> >> >> brick
>> >> >> process). Then gluster replace brick which will replace the brick
>> >> >> with
>> >> >> new
>> >> >> one and also starts the heals.
>> >> >>
>> >> >> If other nodes does not have sufficient storage then this command
>> >> >> should
>> >> >> fail.
>> >> >>
>> >> >> 1) If there are no bricks then tell user, It is clean to remove the
>> >> >> device.
>> >> >> 2) If there are bricks in the device, then find the volume they are
>> >> >> related to from the list of volumes. Brickentry does not have the
>> >> >> volume
>> >> >> name it is associated to.
>> >> >> 3) move the bricks to other devices by calling the allocator for
the
>> >> >> devices.
>> >> >> 4) eliminate the device to be removed and all the nodes which are
>> >> >> associated the volume already.
>> >> >>
>> >> >> We missed on the zone handling part. If there is a way to give the
>> >> >> already used zone and node for the volume to allocator. Then
>> >> >> allocator
>> >> >> can
>> >> >> return the devices which will be from different zone's node. I
think
>> >> >> 2,3,4
>> >> >> will handle if there is only one zone. Let us know if there are any
>> >> >> other
>> >> >> risks or better ways to use allocator.
>> >> >>
>> >> >> [1] https://github.com/heketi/heketi/pull/676
>> >> >>
>> >> >> --
>> >> >> Regards,
>> >> >> Mohamed Ashiq.L
>> >> >>
>> >> >> _______________________________________________
>> >> >> heketi-devel mailing list
>> >> >> heketi-devel at gluster.org
>> >> >> http://lists.gluster.org/mailman/listinfo/heketi-devel
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> Regards,
>> >> Mohamed Ashiq.L
>> >>
>> >> _______________________________________________
>> >> heketi-devel mailing list
>> >> heketi-devel at gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/heketi-devel
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Mohamed Ashiq.L
>> >
>> >
>> > _______________________________________________
>> > heketi-devel mailing list
>> > heketi-devel at gluster.org
>> > http://lists.gluster.org/mailman/listinfo/heketi-devel
>> >
>> _______________________________________________
>> heketi-devel mailing list
>> heketi-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/heketi-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/heketi-devel/attachments/20170309/6a8763c1/attachment-0001.html>


More information about the heketi-devel mailing list