<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 21, 2017 at 1:00 PM, Xavier Hernandez <span dir="ltr"><<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm ok with reverting node-uuid content to the previous format and create a new xattr for the new format. Currently, only rebalance will use it.<br><br>Only thing to consider is what can happen if we have a half upgraded cluster where some clients have this change and some not. Can rebalance work in this situation ? if so, could there be any issue ?<br></blockquote><div><br></div><div>I think there shouldn't be any problem, because this is in-memory xattr so layers below afr/ec will only see node-uuid xattr.<br></div><div>This also gives us a chance to do whatever we want to do in future with this xattr without any problems about backward compatibility.<br><br></div><div>You can check <a href="https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/src/afr-inode-read.c@1507">https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/src/afr-inode-read.c@1507</a> for how karthik implemented this in AFR (this got merged accidentally yesterday, but looks like this is what we are settling on)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>Xavi<div class="gmail-HOEnZb"><div class="gmail-h5"><br><br>On Wednesday, June 21, 2017 06:56 CEST, Pranith Kumar Karampuri <<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>> wrote:<br> <blockquote type="cite" cite="http://CAOgeEnaYPRwt97zYn-ecPPxTkBjZSWMqJ_xOF9JSyYvTa2SUNA@mail.gmail.com"><div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Wed, Jun 21, 2017 at 10:07 AM, Nithya Balachandran <span dir="ltr"><<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"> <div class="gmail_quote"><span>On 20 June 2017 at 20:38, Aravinda <span dir="ltr"><<a href="mailto:avishwan@redhat.com" target="_blank">avishwan@redhat.com</a>></span> wrote:</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span><span>On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:</span></span><blockquote type="cite"><div dir="ltr"><div><div><div><span><span>Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to go with the format Aravinda suggested for now and in future we wanted some more changes for dht to detect which subvolume went down came back up, at that time we will revisit the solution suggested by Xavi.</span></span><br> </div><span><span>Susanth is doing the dht changes</span></span></div><span><span>Aravinda is doing geo-rep changes</span></span></div></div></blockquote><span> Done. Geo-rep patch sent for review <a class="gmail-m_-820786891820273391m_7352592457658641136m_-881474676573699517moz-txt-link-freetext" href="https://review.gluster.org/17582" target="_blank">https://review.gluster.org/175<wbr>82</a></span><br> </div></blockquote><div> </div><div>The proposed changes to the node-uuid behaviour (while good) are going to break tiering . Tiering changes will take a little more time to be coded and tested. </div><div> </div><div>As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we go back to the original node-uuid behaviour for now so as to unblock the release and target the proposed changes for the next 3.11 releases.</div></div></div></div></blockquote><div> </div><div>Let me see if I understand the changes correctly. We are restoring the behavior of node-uuid xattr and adding a new xattr for parallel rebalance for both afr and ec, correct? Otherwise that is one more regression. If yes, we will also wait for Xavi's inputs. Jeff accidentally merged the afr patch yesterday which does these changes. If everyone is in agreement, we will leave it as is and add similar changes in ec as well. If we are not in agreement, then we will let the discussion progress :-)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><div> </div><div>Regards,</div><div>Nithya</div><div><div class="gmail-m_-820786891820273391h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">--<br>Aravinda<div><div class="gmail-m_-820786891820273391m_7352592457658641136h5"> <blockquote type="cite"><div dir="ltr"><div> </div>Thanks to all of you guys for the discussions!<div><div><div><div><div class="gmail_extra"> <div class="gmail_quote">On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez <span dir="ltr"><<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Aravinda,<br><br><span>On 20/06/17 12:42, Aravinda wrote:</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>I think following format can be easily adopted by all components<br><br>UUIDs of a subvolume are seperated by space and subvolumes are separated<br>by comma<br><br>For example, node1 and node2 are replica with U1 and U2 UUIDs<br>respectively and<br>node3 and node4 are replica with U3 and U4 UUIDs respectively<br><br>node-uuid can return "U1 U2,U3 U4"</span></blockquote><span> </span><br>While this is ok for current implementation, I think this can be insufficient if there are more layers of xlators that require to indicate some sort of grouping. Some representation that can represent hierarchy would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or comma as a separator).<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><span>Geo-rep can split by "," and then split by space and take first UUID<br>DHT can split the value by space or comma and get unique UUIDs list</span></blockquote><span> </span><br>This doesn't solve the problem I described in the previous email. Some more logic will need to be added to avoid more than one node from each replica-set to be active. If we have some explicit hierarchy information in the node-uuid value, more decisions can be taken.<br><br>An initial proposal I made was this:<br><br>DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2)))<br><br>This is harder to parse, but gives a lot of information: DHT with 2 subvolumes, each subvolume is an AFR with replica 2 and no arbiters. It's also easily extensible with any new xlator that changes the layout.<br><br>However maybe this is not the moment to do this, and probably we could implement this in a new xattr with a better name.<br><br>Xavi<div class="gmail-m_-820786891820273391m_7352592457658641136m_-881474676573699517HOEnZb"><div class="gmail-m_-820786891820273391m_7352592457658641136m_-881474676573699517h5"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>Another question is about the behavior when a node is down, existing<br>node-uuid xattr will not return that UUID if a node is down. What is the<br>behavior with the proposed xattr?<br><br>Let me know your thoughts.<br><br>regards<br>Aravinda VK<br><br>On 06/20/2017 03:06 PM, Aravinda wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Xavi,<br><br>On 06/20/2017 02:51 PM, Xavier Hernandez wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Aravinda,<br><br>On 20/06/17 11:05, Pranith Kumar Karampuri wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Adding more people to get a consensus about this.<br><br>On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <<a href="mailto:avishwan@redhat.com" target="_blank">avishwan@redhat.com</a><br><mailto:<a href="mailto:avishwan@redhat.com" target="_blank">avishwan@redhat.com</a>>> wrote:<br><br><br> regards<br> Aravinda VK<br><br><br> On 06/20/2017 01:26 PM, Xavier Hernandez wrote:<br><br> Hi Pranith,<br><br> adding gluster-devel, Kotresh and Aravinda,<br><br> On 20/06/17 09:45, Pranith Kumar Karampuri wrote:<br><br><br><br> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez<br> <<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><wbr>>> wrote:<br><br> On 20/06/17 09:31, Pranith Kumar Karampuri wrote:<br><br> The way geo-replication works is:<br> On each machine, it does getxattr of node-uuid and<br> check if its<br> own uuid<br> is present in the list. If it is present then it<br> will consider<br> it active<br> otherwise it will be considered passive. With this<br> change we are<br> giving<br> all uuids instead of first-up subvolume. So all<br> machines think<br> they are<br> ACTIVE which is bad apparently. So that is the<br> reason. Even I<br> felt bad<br> that we are doing this change.<br><br><br> And what about changing the content of node-uuid to<br> include some<br> sort of hierarchy ?<br><br> for example:<br><br> a single brick:<br><br> NODE(<guid>)<br><br> AFR/EC:<br><br> AFR[2](NODE(<guid>), NODE(<guid>))<br> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))<br><br> DHT:<br><br> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),<br> AFR[2](NODE(<guid>),<br> NODE(<guid>)))<br><br> This gives a lot of information that can be used to<br>take the<br> appropriate decisions.<br><br><br> I guess that is not backward compatible. Shall I CC<br> gluster-devel and<br> Kotresh/Aravinda?<br><br><br> Is the change we did backward compatible ? if we only require<br> the first field to be a GUID to support backward compatibility,<br> we can use something like this:<br><br> No. But the necessary change can be made to Geo-rep code as well if<br> format is changed, Since all these are built/shipped together.<br><br> Geo-rep uses node-id as follows,<br><br> list = listxattr(node-uuid)<br> active_node_uuids = list.split(SPACE)<br> active_node_flag = True if self.node_id exists in active_node_uuids<br> else False</blockquote><br>How was this case solved ?<br><br>suppose we have three servers and 2 bricks in each server. A<br>replicated volume is created using the following command:<br><br>gluster volume create test replica 2 server1:/brick1 server2:/brick1<br>server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2<br><br>In this case we have three replica-sets:<br><br>* server1:/brick1 server2:/brick1<br>* server2:/brick2 server3:/brick1<br>* server3:/brick2 server2:/brick2<br><br>Old AFR implementation for node-uuid always returned the uuid of the<br>node of the first brick, so in this case we will get the uuid of the<br>three nodes because all of them are the first brick of a replica-set.<br><br>Does this mean that with this configuration all nodes are active ? Is<br>this a problem ? Is there any other check to avoid this situation if<br>it's not good ?</blockquote> Yes all Geo-rep workers will become Active and participate in syncing.<br>Since changelogs will have the same information in replica bricks this<br>will lead to duplicate syncing and consuming network bandwidth.<br><br>Node-uuid based Active worker is the default configuration in Geo-rep<br>till now, Geo-rep also has Meta Volume based syncronization for Active<br>worker using lock files.(Can be opted using Geo-rep configuration,<br>with this config node-uuid will not be used)<br><br>Kotresh proposed a solution to configure which worker to become<br>Active. This will give more control to Admin to choose Active workers,<br>This will become default configuration from 3.12<br><a rel="noreferrer" href="https://github.com/gluster/glusterfs/issues/244" target="_blank">https://github.com/gluster/glu<wbr>sterfs/issues/244</a><br><br>--<br>Aravinda<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>Xavi<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><br><br> Bricks:<br><br> <guid><br><br> AFR/EC:<br> <guid>(<guid>, <guid>)<br><br> DHT:<br> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))<br><br> In this case, AFR and EC would return the same <guid> they<br> returned before the patch, but between '(' and ')' they put the<br> full list of guid's of all nodes. The first <guid> can be used<br> by geo-replication. The list after the first <guid> can be used<br> for rebalance.<br><br> Not sure if there's any user of node-uuid above DHT.<br><br> Xavi<br><br><br><br><br> Xavi<br><br><br> On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez<br> <<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><br><mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><wbr>><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><br><mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a><br> <mailto:<a href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>><wbr>>>><br> wrote:<br><br> Hi Pranith,<br><br> On 20/06/17 07:53, Pranith Kumar Karampuri<br>wrote:<br><br> hi Xavi,<br> We all made the mistake of not<br> sending about changing<br> behavior of<br> node-uuid xattr so that rebalance can use<br> multiple nodes<br> for doing<br> rebalance. Because of this on geo-rep all<br> the workers<br> are becoming<br> active instead of one per EC/AFR subvolume.<br> So we are<br> frantically trying<br> to restore the functionality of node-uuid<br> and introduce<br> a new<br> xattr for<br> the new behavior. Sunil will be sending out<br> a patch for<br> this.<br><br><br> Wouldn't it be better to change geo-rep<br>behavior<br> to use the<br> new data<br> ? I think it's better as it's now, since it<br> gives more<br> information<br> to upper layers so that they can take more<br> accurate decisions.<br><br> Xavi<br><br><br> --<br> Pranith<br><br><br><br><br><br> --<br> Pranith<br><br><br><br><br><br> --<br> Pranith<br><br><br><br><br><br><br>--<br>Pranith</blockquote></blockquote></blockquote></blockquote></div></div></blockquote></div><br><br clear="all"><br>--<div class="gmail-m_-820786891820273391m_7352592457658641136m_-881474676573699517gmail_signature"><div dir="ltr">Pranith</div></div></div></div></div></div></div></div></blockquote></div></div></div></blockquote></div></div></div></div></div></blockquote></div><br><br clear="all"><br>--<div class="gmail-m_-820786891820273391gmail_signature"><div dir="ltr">Pranith</div></div></div></div></blockquote><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr">Pranith<br></div></div>
</div></div>