[heketi-devel] Remove Device: Used to distribute all the bricks from device to other devices
Luis Pabon
lpabon at chrysalix.org
Thu Mar 9 21:17:08 UTC 2017
This looks great. Thanks for the state machine diagram.
On Thu, Mar 9, 2017 at 11:06 AM, Raghavendra Talur <rtalur at redhat.com>
wrote:
> The PR now has the API changes requested.
>
> Before review, here is the current state diagram
>
>
>
> +-----------------+ disable/offline +------------------+
> | |--------------------->| |
> | | | Offline/Disabled |
> | Online/Enabled |<---------------------| |
> | | enable/online | |
> +-----------------+ +------------------+
> ^ ^ |
> | | |remove
> | offline| |
> |add | |
> | | |
> | | |
> | | v
> +------------------+ +-------------------+
> | | | |
> | Deleted | | Failed/Removed |
> | |<--------------------| |
> | | delete | |
> +------------------+ +-------------------+
>
>
>
> The current implementation *requires* the device to be in "Offline" state
> before it can be removed. Some of the operations shown above aren't
> implemented yet.
> Is this acceptable to all ? Are there any concerns or suggestions?
>
> Thanks,
> Raghavendra Talur
>
>
> On Thu, Mar 9, 2017 at 10:43 AM, Luis Pabon <lpabon at chrysalix.org> wrote:
> > Awesome. I'll definitely review tomorrow.
> >
> > - Luis
> >
> > On Wed, Mar 8, 2017 at 7:59 PM, Raghavendra Talur <rtalur at redhat.com>
> wrote:
> >>
> >> Hi Luis,
> >>
> >> Please have a look at PR 710 which has changes that you requested.
> >>
> >> I have followed the revert of revert model for merge commits as
> >> suggested by Linus in
> >>
> >> https://raw.githubusercontent.com/git/git/master/
> Documentation/howto/revert-a-faulty-merge.txt
> >> for create a new PR.
> >>
> >> If you prefer it to be in any other way, please let us know.
> >>
> >> Also, these changes don't have API+Async changes and Refactored code
> >> from allocator.
> >> I will send them in a few hours. Meanwhile I wanted to put the simpler
> >> stuff out for review.
> >>
> >> Thanks,
> >> Raghavendra Talur
> >>
> >> On Wed, Feb 22, 2017 at 2:01 PM, Mohamed Ashiq Liyazudeen
> >> <mliyazud at redhat.com> wrote:
> >> > Hi,
> >> >
> >> > New commit addresses all the comments. Please Review and comment on
> the
> >> > PR.
> >> >
> >> > Prerequisites, Done:
> >> > We now added VolumeId in BrickEntry and VolumeInfo Executor call which
> >> > will
> >> > return Whole information of volume from gluster Itself(instead of
> saving
> >> > the
> >> > brick peer(brickset), we generate the brick peers from this
> >> > information).
> >> >
> >> >
> >> > How does this work:
> >> >
> >> > For a Device to be remove.
> >> > First If the Device is Empty then Return ok to remove.
> >> > Else
> >> > Get the bricklist for bricks in device to be removed and its
> appropriate
> >> > volumeEntrylist for bricks.
> >> > Call Replace brick for a volume with the brickId.
> >> >
> >> >
> >> > In Replace Brick Logic:
> >> > 1)First we Find the BrickSet(a set in which brick belongs, For Example
> >> > in
> >> > Distribute-Replicate 2x3 [A(A1,A2,A3), B(B1,B2,B3)], B2 is on set B)
> in
> >> > which the brick to be replaced is present.
> >> > Reason to find this is we should not place the brick with another
> brick
> >> > of
> >> > same set(which will cause Quorum to be met if one node is down and
> also
> >> > not
> >> > a good design).
> >> > 2) Call the allocator to give out devices for the same cluster.
> >> > 3)Ignore the Device IF:
> >> > a)Same Device to be removed
> >> > b)Device belongs to same Node where one of the other bricks in Set is
> >> > present
> >> > 4) With above logic We can still use the logic of simpleAllocator ring
> >> > to
> >> > decide the brick placement with single Zone and Multiple zones.
> >> > 5) On Failure returns Err and In case of NoSpaceError, We Respond
> >> > Replacementnotfound.
> >> >
> >> >
> >> > Note:
> >> > Few basic tests added for New VolumeId for BrickEntry and all the
> >> > failure
> >> > based on executor.SimpleVolumeInfo change from executor.VolumeInfo has
> >> > been
> >> > fixed.
> >> > Kept Device Remove modular so that can be used for Node Remove.
> >> >
> >> >
> >> > To Be Done:
> >> > Tests to be Added.
> >> >
> >> >
> >> > [1] https://github.com/heketi/heketi/pull/676
> >> >
> >> > -- Ashiq,Talur
> >> > ________________________________
> >> > From: "Luis Pabon" <lpabon at chrysalix.org>
> >> > To: "Mohamed Ashiq Liyazudeen" <mliyazud at redhat.com>
> >> > Cc: heketi-devel at gluster.org
> >> > Sent: Friday, February 17, 2017 1:49:32 AM
> >> >
> >> > Subject: Re: [heketi-devel] Remove Device: Used to distribute all the
> >> > bricks
> >> > from device to other devices
> >> >
> >> > FYI, unless by some miracle there is no way this feature will be in by
> >> > Sunday. This feature is one of the hardest part of Heketi which is
> why
> >> > https://github.com/heketi/heketi/issues/161 has taken so long.
> >> >
> >> > The brick set is the heart of this change. A brick set is how Heketi
> >> > sets
> >> > up the replicas in a ring. For example: in a distributed replicated
> >> > 2x3,
> >> > brick A would need A1 and A2 as replicas. Therefore, A,A1,A2 are a
> set.
> >> > Same applies for B,B1,B2.
> >> >
> >> > Replacing a device which contains B1 (for example), would need a
> >> > replacement
> >> > brick which satisfies B and B2 for the set to be complete. Same thing
> >> > applies for EC where it is A,A1...A(n).
> >> >
> >> > This is a big change, which requires a good algorithm, execution, and
> >> > testing.
> >> >
> >> > - Luis
> >> >
> >> > On Thu, Feb 16, 2017 at 2:25 PM, Mohamed Ashiq Liyazudeen
> >> > <mliyazud at redhat.com> wrote:
> >> >>
> >> >> Hi Luis,
> >> >>
> >> >> I agree on adding the VolumeId part to db for bricks. I didn't get
> what
> >> >> you mean by brick peers?
> >> >>
> >> >> I wanted to know better about the allocator behaviors based on number
> >> >> of
> >> >> zones. If you see our example topology file, It has 4 nodes with
> >> >> multiple
> >> >> devices but 2 nodes are associated to a zone. There are only two
> zones
> >> >> now
> >> >> and while creating replica three volume how is the allocator creates
> >> >> ring of
> >> >> devices? Mainly in this case we can not ignore both zones.
> >> >>
> >> >> Also wanted to know in case of volume expand how are we approaching.
> I
> >> >> thought it will be using something similar to give the state(where
> the
> >> >> present brick are) of existing volume to allocator and allocator
> will
> >> >> give
> >> >> back ring without those zones or nodes. But I think (correct me if I
> am
> >> >> wrong) Volume is changed by adding appropriate bricks, In the sense
> >> >> replica
> >> >> 3(3x1) is added bricks and made distribute replica 3(3x2). I agree
> this
> >> >> is
> >> >> the way to go, just trying to understand allocator better.
> >> >>
> >> >> We need this feature to be in by Sunday. I will be working on it
> >> >> mostly,
> >> >> Will definitely mail but is there any place to chat with you in case
> of
> >> >> doubts and quick answers?
> >> >>
> >> >> Tomorrow as first thing will add the VolumeId and brick peers(not
> sure
> >> >> what is it exactly).
> >> >>
> >> >> --
> >> >> Ashiq
> >> >>
> >> >> ----- Original Message -----
> >> >> From: "Luis Pabon" <lpabon at chrysalix.org>
> >> >> To: "Mohamed Ashiq Liyazudeen" <mliyazud at redhat.com>
> >> >> Cc: heketi-devel at gluster.org
> >> >> Sent: Thursday, February 16, 2017 11:32:55 PM
> >> >> Subject: Re: [heketi-devel] Remove Device: Used to distribute all the
> >> >> bricks from device to other devices
> >> >>
> >> >> After we agree on the algorithm, the first PR would be to add the
> >> >> necessary
> >> >> framework to the DB to support #676.
> >> >>
> >> >> - Luis
> >> >>
> >> >> On Thu, Feb 16, 2017 at 1:00 PM, Luis Pabon <lpabon at chrysalix.org>
> >> >> wrote:
> >> >>
> >> >> > Great summary. Yes, the next step should be to figure out how to
> >> >> > enhance
> >> >> > the ring to return a brick for another zone. It could be as simple
> >> >> > as:
> >> >> >
> >> >> > If current bricks in set are in different zones:
> >> >> > Get a ring
> >> >> > Remove disks from the ring in zones already used
> >> >> > Return devices until one is found with the appropriate size
> >> >> > else:
> >> >> > Get a ring
> >> >> > Return devices until one is found with the appropriate size
> >> >> >
> >> >> > Also, order of the disks may matter. This part I am not sure of,
> >> >> > but,
> >> >> > we
> >> >> > may need to make sure of the order of the bricks were added to the
> >> >> > volume
> >> >> > during 'create'. This may be necessary to determine which of the
> >> >> > bricks
> >> >> > in
> >> >> > the brick set are in different zones.
> >> >> >
> >> >> > We may have to add a new DB entry in the Brick Entry. For example:
> >> >> > Brick
> >> >> > peers, and Volume ID
> >> >> >
> >> >> > - Luis
> >> >> >
> >> >> > On Wed, Feb 15, 2017 at 2:17 PM, Mohamed Ashiq Liyazudeen <
> >> >> > mliyazud at redhat.com> wrote:
> >> >> >
> >> >> >> Hi,
> >> >> >>
> >> >> >> This mail talks about the PR[1]
> >> >> >>
> >> >> >> Let me start off with what is planned to do in this.
> >> >> >>
> >> >> >> We only support this feature for Replicate and Distribute
> Replicate
> >> >> >> Volume.
> >> >> >> Refer: https://gluster.readthedocs.io/en/latest/Administrator%
> 20Gui
> >> >> >> de/Managing%20Volumes/#replace-brick
> >> >> >>
> >> >> >> Removes all the brick from the device and start these bricks on
> >> >> >> other
> >> >> >> devices based on allocator. Heal is triggered automatically for
> >> >> >> replicate
> >> >> >> volumes on replace brick. Allocate and create new brick to
> replace.
> >> >> >> It
> >> >> >> stops the brick to be replaced, If it is not already down(kill the
> >> >> >> brick
> >> >> >> process). Then gluster replace brick which will replace the brick
> >> >> >> with
> >> >> >> new
> >> >> >> one and also starts the heals.
> >> >> >>
> >> >> >> If other nodes does not have sufficient storage then this command
> >> >> >> should
> >> >> >> fail.
> >> >> >>
> >> >> >> 1) If there are no bricks then tell user, It is clean to remove
> the
> >> >> >> device.
> >> >> >> 2) If there are bricks in the device, then find the volume they
> are
> >> >> >> related to from the list of volumes. Brickentry does not have the
> >> >> >> volume
> >> >> >> name it is associated to.
> >> >> >> 3) move the bricks to other devices by calling the allocator for
> the
> >> >> >> devices.
> >> >> >> 4) eliminate the device to be removed and all the nodes which are
> >> >> >> associated the volume already.
> >> >> >>
> >> >> >> We missed on the zone handling part. If there is a way to give the
> >> >> >> already used zone and node for the volume to allocator. Then
> >> >> >> allocator
> >> >> >> can
> >> >> >> return the devices which will be from different zone's node. I
> think
> >> >> >> 2,3,4
> >> >> >> will handle if there is only one zone. Let us know if there are
> any
> >> >> >> other
> >> >> >> risks or better ways to use allocator.
> >> >> >>
> >> >> >> [1] https://github.com/heketi/heketi/pull/676
> >> >> >>
> >> >> >> --
> >> >> >> Regards,
> >> >> >> Mohamed Ashiq.L
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> heketi-devel mailing list
> >> >> >> heketi-devel at gluster.org
> >> >> >> http://lists.gluster.org/mailman/listinfo/heketi-devel
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> Regards,
> >> >> Mohamed Ashiq.L
> >> >>
> >> >> _______________________________________________
> >> >> heketi-devel mailing list
> >> >> heketi-devel at gluster.org
> >> >> http://lists.gluster.org/mailman/listinfo/heketi-devel
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Mohamed Ashiq.L
> >> >
> >> >
> >> > _______________________________________________
> >> > heketi-devel mailing list
> >> > heketi-devel at gluster.org
> >> > http://lists.gluster.org/mailman/listinfo/heketi-devel
> >> >
> >> _______________________________________________
> >> heketi-devel mailing list
> >> heketi-devel at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/heketi-devel
> >
> >
>
>
> _______________________________________________
> heketi-devel mailing list
> heketi-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/heketi-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/heketi-devel/attachments/20170309/69e4eafe/attachment-0001.html>
More information about the heketi-devel
mailing list