<div dir="ltr">The PR now has the API changes requested.<div><br></div><div>Before review, here is the current state diagram</div><div><br></div><div><br></div><div><br></div><div><pre class="gmail-result--content--pre" style="margin-top:0px;margin-bottom:0px;padding:0px;overflow:hidden;line-height:1.2em;color:rgb(0,0,0)">+-----------------+ disable/offline +------------------+
| |--------------------->| |
| | | Offline/Disabled |
| Online/Enabled |<---------------------| |
| | enable/online | |
+-----------------+ +------------------+
^ ^ |
| | |remove
| offline| |
|add | |
| | |
| | |
| | v
+------------------+ +-------------------+
| | | |
| Deleted | | Failed/Removed |
| |<--------------------| |
| | delete | |
+------------------+ +-------------------+</pre></div><div><br></div><div>The current implementation *requires* the device to be in "Offline" state before it can be removed. Some of the operations shown above aren't implemented yet.</div><div>Is this acceptable to all ? Are there any concerns or suggestions?</div><div><br></div><div>Thanks,</div><div>Raghavendra Talur</div><div><br><br>On Thu, Mar 9, 2017 at 10:43 AM, Luis Pabon <<a href="mailto:lpabon@chrysalix.org">lpabon@chrysalix.org</a>> wrote:<br>> Awesome. I'll definitely review tomorrow.<br>><br>> - Luis<br>><br>> On Wed, Mar 8, 2017 at 7:59 PM, Raghavendra Talur <<a href="mailto:rtalur@redhat.com">rtalur@redhat.com</a>> wrote:<br>>><br>>> Hi Luis,<br>>><br>>> Please have a look at PR 710 which has changes that you requested.<br>>><br>>> I have followed the revert of revert model for merge commits as<br>>> suggested by Linus in<br>>><br>>> <a href="https://raw.githubusercontent.com/git/git/master/Documentation/howto/revert-a-faulty-merge.txt">https://raw.githubusercontent.com/git/git/master/Documentation/howto/revert-a-faulty-merge.txt</a><br>>> for create a new PR.<br>>><br>>> If you prefer it to be in any other way, please let us know.<br>>><br>>> Also, these changes don't have API+Async changes and Refactored code<br>>> from allocator.<br>>> I will send them in a few hours. Meanwhile I wanted to put the simpler<br>>> stuff out for review.<br>>><br>>> Thanks,<br>>> Raghavendra Talur<br>>><br>>> On Wed, Feb 22, 2017 at 2:01 PM, Mohamed Ashiq Liyazudeen<br>>> <<a href="mailto:mliyazud@redhat.com">mliyazud@redhat.com</a>> wrote:<br>>> > Hi,<br>>> ><br>>> > New commit addresses all the comments. Please Review and comment on the<br>>> > PR.<br>>> ><br>>> > Prerequisites, Done:<br>>> > We now added VolumeId in BrickEntry and VolumeInfo Executor call which<br>>> > will<br>>> > return Whole information of volume from gluster Itself(instead of saving<br>>> > the<br>>> > brick peer(brickset), we generate the brick peers from this<br>>> > information).<br>>> ><br>>> ><br>>> > How does this work:<br>>> ><br>>> > For a Device to be remove.<br>>> > First If the Device is Empty then Return ok to remove.<br>>> > Else<br>>> > Get the bricklist for bricks in device to be removed and its appropriate<br>>> > volumeEntrylist for bricks.<br>>> > Call Replace brick for a volume with the brickId.<br>>> ><br>>> ><br>>> > In Replace Brick Logic:<br>>> > 1)First we Find the BrickSet(a set in which brick belongs, For Example<br>>> > in<br>>> > Distribute-Replicate 2x3 [A(A1,A2,A3), B(B1,B2,B3)], B2 is on set B) in<br>>> > which the brick to be replaced is present.<br>>> > Reason to find this is we should not place the brick with another brick<br>>> > of<br>>> > same set(which will cause Quorum to be met if one node is down and also<br>>> > not<br>>> > a good design).<br>>> > 2) Call the allocator to give out devices for the same cluster.<br>>> > 3)Ignore the Device IF:<br>>> > a)Same Device to be removed<br>>> > b)Device belongs to same Node where one of the other bricks in Set is<br>>> > present<br>>> > 4) With above logic We can still use the logic of simpleAllocator ring<br>>> > to<br>>> > decide the brick placement with single Zone and Multiple zones.<br>>> > 5) On Failure returns Err and In case of NoSpaceError, We Respond<br>>> > Replacementnotfound.<br>>> ><br>>> ><br>>> > Note:<br>>> > Few basic tests added for New VolumeId for BrickEntry and all the<br>>> > failure<br>>> > based on executor.SimpleVolumeInfo change from executor.VolumeInfo has<br>>> > been<br>>> > fixed.<br>>> > Kept Device Remove modular so that can be used for Node Remove.<br>>> ><br>>> ><br>>> > To Be Done:<br>>> > Tests to be Added.<br>>> ><br>>> ><br>>> > [1] <a href="https://github.com/heketi/heketi/pull/676">https://github.com/heketi/heketi/pull/676</a><br>>> ><br>>> > -- Ashiq,Talur<br>>> > ________________________________<br>>> > From: "Luis Pabon" <<a href="mailto:lpabon@chrysalix.org">lpabon@chrysalix.org</a>><br>>> > To: "Mohamed Ashiq Liyazudeen" <<a href="mailto:mliyazud@redhat.com">mliyazud@redhat.com</a>><br>>> > Cc: <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> > Sent: Friday, February 17, 2017 1:49:32 AM<br>>> ><br>>> > Subject: Re: [heketi-devel] Remove Device: Used to distribute all the<br>>> > bricks<br>>> > from device to other devices<br>>> ><br>>> > FYI, unless by some miracle there is no way this feature will be in by<br>>> > Sunday. This feature is one of the hardest part of Heketi which is why<br>>> > <a href="https://github.com/heketi/heketi/issues/161">https://github.com/heketi/heketi/issues/161</a> has taken so long.<br>>> ><br>>> > The brick set is the heart of this change. A brick set is how Heketi<br>>> > sets<br>>> > up the replicas in a ring. For example: in a distributed replicated<br>>> > 2x3,<br>>> > brick A would need A1 and A2 as replicas. Therefore, A,A1,A2 are a set.<br>>> > Same applies for B,B1,B2.<br>>> ><br>>> > Replacing a device which contains B1 (for example), would need a<br>>> > replacement<br>>> > brick which satisfies B and B2 for the set to be complete. Same thing<br>>> > applies for EC where it is A,A1...A(n).<br>>> ><br>>> > This is a big change, which requires a good algorithm, execution, and<br>>> > testing.<br>>> ><br>>> > - Luis<br>>> ><br>>> > On Thu, Feb 16, 2017 at 2:25 PM, Mohamed Ashiq Liyazudeen<br>>> > <<a href="mailto:mliyazud@redhat.com">mliyazud@redhat.com</a>> wrote:<br>>> >><br>>> >> Hi Luis,<br>>> >><br>>> >> I agree on adding the VolumeId part to db for bricks. I didn't get what<br>>> >> you mean by brick peers?<br>>> >><br>>> >> I wanted to know better about the allocator behaviors based on number<br>>> >> of<br>>> >> zones. If you see our example topology file, It has 4 nodes with<br>>> >> multiple<br>>> >> devices but 2 nodes are associated to a zone. There are only two zones<br>>> >> now<br>>> >> and while creating replica three volume how is the allocator creates<br>>> >> ring of<br>>> >> devices? Mainly in this case we can not ignore both zones.<br>>> >><br>>> >> Also wanted to know in case of volume expand how are we approaching. I<br>>> >> thought it will be using something similar to give the state(where the<br>>> >> present brick are) of existing volume to allocator and allocator will<br>>> >> give<br>>> >> back ring without those zones or nodes. But I think (correct me if I am<br>>> >> wrong) Volume is changed by adding appropriate bricks, In the sense<br>>> >> replica<br>>> >> 3(3x1) is added bricks and made distribute replica 3(3x2). I agree this<br>>> >> is<br>>> >> the way to go, just trying to understand allocator better.<br>>> >><br>>> >> We need this feature to be in by Sunday. I will be working on it<br>>> >> mostly,<br>>> >> Will definitely mail but is there any place to chat with you in case of<br>>> >> doubts and quick answers?<br>>> >><br>>> >> Tomorrow as first thing will add the VolumeId and brick peers(not sure<br>>> >> what is it exactly).<br>>> >><br>>> >> --<br>>> >> Ashiq<br>>> >><br>>> >> ----- Original Message -----<br>>> >> From: "Luis Pabon" <<a href="mailto:lpabon@chrysalix.org">lpabon@chrysalix.org</a>><br>>> >> To: "Mohamed Ashiq Liyazudeen" <<a href="mailto:mliyazud@redhat.com">mliyazud@redhat.com</a>><br>>> >> Cc: <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> >> Sent: Thursday, February 16, 2017 11:32:55 PM<br>>> >> Subject: Re: [heketi-devel] Remove Device: Used to distribute all the<br>>> >> bricks from device to other devices<br>>> >><br>>> >> After we agree on the algorithm, the first PR would be to add the<br>>> >> necessary<br>>> >> framework to the DB to support #676.<br>>> >><br>>> >> - Luis<br>>> >><br>>> >> On Thu, Feb 16, 2017 at 1:00 PM, Luis Pabon <<a href="mailto:lpabon@chrysalix.org">lpabon@chrysalix.org</a>><br>>> >> wrote:<br>>> >><br>>> >> > Great summary. Yes, the next step should be to figure out how to<br>>> >> > enhance<br>>> >> > the ring to return a brick for another zone. It could be as simple<br>>> >> > as:<br>>> >> ><br>>> >> > If current bricks in set are in different zones:<br>>> >> > Get a ring<br>>> >> > Remove disks from the ring in zones already used<br>>> >> > Return devices until one is found with the appropriate size<br>>> >> > else:<br>>> >> > Get a ring<br>>> >> > Return devices until one is found with the appropriate size<br>>> >> ><br>>> >> > Also, order of the disks may matter. This part I am not sure of,<br>>> >> > but,<br>>> >> > we<br>>> >> > may need to make sure of the order of the bricks were added to the<br>>> >> > volume<br>>> >> > during 'create'. This may be necessary to determine which of the<br>>> >> > bricks<br>>> >> > in<br>>> >> > the brick set are in different zones.<br>>> >> ><br>>> >> > We may have to add a new DB entry in the Brick Entry. For example:<br>>> >> > Brick<br>>> >> > peers, and Volume ID<br>>> >> ><br>>> >> > - Luis<br>>> >> ><br>>> >> > On Wed, Feb 15, 2017 at 2:17 PM, Mohamed Ashiq Liyazudeen <<br>>> >> > <a href="mailto:mliyazud@redhat.com">mliyazud@redhat.com</a>> wrote:<br>>> >> ><br>>> >> >> Hi,<br>>> >> >><br>>> >> >> This mail talks about the PR[1]<br>>> >> >><br>>> >> >> Let me start off with what is planned to do in this.<br>>> >> >><br>>> >> >> We only support this feature for Replicate and Distribute Replicate<br>>> >> >> Volume.<br>>> >> >> Refer: <a href="https://gluster.readthedocs.io/en/latest/Administrator%20Gui">https://gluster.readthedocs.io/en/latest/Administrator%20Gui</a><br>>> >> >> de/Managing%20Volumes/#replace-brick<br>>> >> >><br>>> >> >> Removes all the brick from the device and start these bricks on<br>>> >> >> other<br>>> >> >> devices based on allocator. Heal is triggered automatically for<br>>> >> >> replicate<br>>> >> >> volumes on replace brick. Allocate and create new brick to replace.<br>>> >> >> It<br>>> >> >> stops the brick to be replaced, If it is not already down(kill the<br>>> >> >> brick<br>>> >> >> process). Then gluster replace brick which will replace the brick<br>>> >> >> with<br>>> >> >> new<br>>> >> >> one and also starts the heals.<br>>> >> >><br>>> >> >> If other nodes does not have sufficient storage then this command<br>>> >> >> should<br>>> >> >> fail.<br>>> >> >><br>>> >> >> 1) If there are no bricks then tell user, It is clean to remove the<br>>> >> >> device.<br>>> >> >> 2) If there are bricks in the device, then find the volume they are<br>>> >> >> related to from the list of volumes. Brickentry does not have the<br>>> >> >> volume<br>>> >> >> name it is associated to.<br>>> >> >> 3) move the bricks to other devices by calling the allocator for the<br>>> >> >> devices.<br>>> >> >> 4) eliminate the device to be removed and all the nodes which are<br>>> >> >> associated the volume already.<br>>> >> >><br>>> >> >> We missed on the zone handling part. If there is a way to give the<br>>> >> >> already used zone and node for the volume to allocator. Then<br>>> >> >> allocator<br>>> >> >> can<br>>> >> >> return the devices which will be from different zone's node. I think<br>>> >> >> 2,3,4<br>>> >> >> will handle if there is only one zone. Let us know if there are any<br>>> >> >> other<br>>> >> >> risks or better ways to use allocator.<br>>> >> >><br>>> >> >> [1] <a href="https://github.com/heketi/heketi/pull/676">https://github.com/heketi/heketi/pull/676</a><br>>> >> >><br>>> >> >> --<br>>> >> >> Regards,<br>>> >> >> Mohamed Ashiq.L<br>>> >> >><br>>> >> >> _______________________________________________<br>>> >> >> heketi-devel mailing list<br>>> >> >> <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> >> >> <a href="http://lists.gluster.org/mailman/listinfo/heketi-devel">http://lists.gluster.org/mailman/listinfo/heketi-devel</a><br>>> >> >><br>>> >> ><br>>> >> ><br>>> >><br>>> >> --<br>>> >> Regards,<br>>> >> Mohamed Ashiq.L<br>>> >><br>>> >> _______________________________________________<br>>> >> heketi-devel mailing list<br>>> >> <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> >> <a href="http://lists.gluster.org/mailman/listinfo/heketi-devel">http://lists.gluster.org/mailman/listinfo/heketi-devel</a><br>>> ><br>>> ><br>>> ><br>>> ><br>>> ><br>>> > --<br>>> > Regards,<br>>> > Mohamed Ashiq.L<br>>> ><br>>> ><br>>> > _______________________________________________<br>>> > heketi-devel mailing list<br>>> > <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> > <a href="http://lists.gluster.org/mailman/listinfo/heketi-devel">http://lists.gluster.org/mailman/listinfo/heketi-devel</a><br>>> ><br>>> _______________________________________________<br>>> heketi-devel mailing list<br>>> <a href="mailto:heketi-devel@gluster.org">heketi-devel@gluster.org</a><br>>> <a href="http://lists.gluster.org/mailman/listinfo/heketi-devel">http://lists.gluster.org/mailman/listinfo/heketi-devel</a><br>><br>><br><br></div></div>