[Gluster-devel] geo-rep regression because of node-uuid change

Wed Jun 21 04:37:58 UTC 2017

On 20 June 2017 at 20:38, Aravinda <avishwan at redhat.com> wrote:

> On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:
>
> Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to go
> with the format Aravinda suggested for now and in future we wanted some
> more changes for dht to detect which subvolume went down came back up, at
> that time we will revisit the solution suggested by Xavi.
>
> Susanth is doing the dht changes
> Aravinda is doing geo-rep changes
>
> Done. Geo-rep patch sent for review https://review.gluster.org/17582
>
>
The proposed changes to the node-uuid behaviour (while good) are going to
break tiering . Tiering changes will take a little more time to be coded
and tested.

As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we go
back to the original node-uuid behaviour for now so as to unblock the
release and target the proposed changes for the next 3.11 releases.

Regards,
Nithya

> --
> Aravinda
>
>
>
> Thanks to all of you guys for the discussions!
>
> On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez <xhernandez at datalab.es>
> wrote:
>
>> Hi Aravinda,
>>
>> On 20/06/17 12:42, Aravinda wrote:
>>
>>> I think following format can be easily adopted by all components
>>>
>>> UUIDs of a subvolume are seperated by space and subvolumes are separated
>>> by comma
>>>
>>> For example, node1 and node2 are replica with U1 and U2 UUIDs
>>> respectively and
>>> node3 and node4 are replica with U3 and U4 UUIDs respectively
>>>
>>> node-uuid can return "U1 U2,U3 U4"
>>>
>>
>> While this is ok for current implementation, I think this can be
>> insufficient if there are more layers of xlators that require to indicate
>> some sort of grouping. Some representation that can represent hierarchy
>> would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or comma
>> as a separator).
>>
>>
>>> Geo-rep can split by "," and then split by space and take first UUID
>>> DHT can split the value by space or comma and get unique UUIDs list
>>>
>>
>> This doesn't solve the problem I described in the previous email. Some
>> more logic will need to be added to avoid more than one node from each
>> replica-set to be active. If we have some explicit hierarchy information in
>> the node-uuid value, more decisions can be taken.
>>
>> An initial proposal I made was this:
>>
>> DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2)))
>>
>> This is harder to parse, but gives a lot of information: DHT with 2
>> subvolumes, each subvolume is an AFR with replica 2 and no arbiters. It's
>> also easily extensible with any new xlator that changes the layout.
>>
>> However maybe this is not the moment to do this, and probably we could
>> implement this in a new xattr with a better name.
>>
>> Xavi
>>
>>
>>
>>> Another question is about the behavior when a node is down, existing
>>> node-uuid xattr will not return that UUID if a node is down. What is the
>>> behavior with the proposed xattr?
>>>
>>> Let me know your thoughts.
>>>
>>> regards
>>> Aravinda VK
>>>
>>> On 06/20/2017 03:06 PM, Aravinda wrote:
>>>
>>>> Hi Xavi,
>>>>
>>>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
>>>>
>>>>> Hi Aravinda,
>>>>>
>>>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>> Adding more people to get a consensus about this.
>>>>>>
>>>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
>>>>>> <mailto:avishwan at redhat.com>> wrote:
>>>>>>
>>>>>>
>>>>>>     regards
>>>>>>     Aravinda VK
>>>>>>
>>>>>>
>>>>>>     On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>>>>>>
>>>>>>         Hi Pranith,
>>>>>>
>>>>>>         adding gluster-devel, Kotresh and Aravinda,
>>>>>>
>>>>>>         On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>             On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>>>>>>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>>>>>             <mailto:xhernandez at datalab.es
>>>>>>             <mailto:xhernandez at datalab.es>>> wrote:
>>>>>>
>>>>>>                 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>>>>>
>>>>>>                     The way geo-replication works is:
>>>>>>                     On each machine, it does getxattr of node-uuid and
>>>>>>             check if its
>>>>>>                     own uuid
>>>>>>                     is present in the list. If it is present then it
>>>>>>             will consider
>>>>>>                     it active
>>>>>>                     otherwise it will be considered passive. With this
>>>>>>             change we are
>>>>>>                     giving
>>>>>>                     all uuids instead of first-up subvolume. So all
>>>>>>             machines think
>>>>>>                     they are
>>>>>>                     ACTIVE which is bad apparently. So that is the
>>>>>>             reason. Even I
>>>>>>                     felt bad
>>>>>>                     that we are doing this change.
>>>>>>
>>>>>>
>>>>>>                 And what about changing the content of node-uuid to
>>>>>>             include some
>>>>>>                 sort of hierarchy ?
>>>>>>
>>>>>>                 for example:
>>>>>>
>>>>>>                 a single brick:
>>>>>>
>>>>>>                 NODE(<guid>)
>>>>>>
>>>>>>                 AFR/EC:
>>>>>>
>>>>>>                 AFR[2](NODE(<guid>), NODE(<guid>))
>>>>>>                 EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>>>>>
>>>>>>                 DHT:
>>>>>>
>>>>>>                 DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>>>>>>             AFR[2](NODE(<guid>),
>>>>>>                 NODE(<guid>)))
>>>>>>
>>>>>>                 This gives a lot of information that can be used to
>>>>>> take the
>>>>>>                 appropriate decisions.
>>>>>>
>>>>>>
>>>>>>             I guess that is not backward compatible. Shall I CC
>>>>>>             gluster-devel and
>>>>>>             Kotresh/Aravinda?
>>>>>>
>>>>>>
>>>>>>         Is the change we did backward compatible ? if we only require
>>>>>>         the first field to be a GUID to support backward
>>>>>> compatibility,
>>>>>>         we can use something like this:
>>>>>>
>>>>>>     No. But the necessary change can be made to Geo-rep code as well
>>>>>> if
>>>>>>     format is changed, Since all these are built/shipped together.
>>>>>>
>>>>>>     Geo-rep uses node-id as follows,
>>>>>>
>>>>>>     list = listxattr(node-uuid)
>>>>>>     active_node_uuids = list.split(SPACE)
>>>>>>     active_node_flag = True if self.node_id exists in
>>>>>> active_node_uuids
>>>>>>     else False
>>>>>>
>>>>>
>>>>> How was this case solved ?
>>>>>
>>>>> suppose we have three servers and 2 bricks in each server. A
>>>>> replicated volume is created using the following command:
>>>>>
>>>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1
>>>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>>>>>
>>>>> In this case we have three replica-sets:
>>>>>
>>>>> * server1:/brick1 server2:/brick1
>>>>> * server2:/brick2 server3:/brick1
>>>>> * server3:/brick2 server2:/brick2
>>>>>
>>>>> Old AFR implementation for node-uuid always returned the uuid of the
>>>>> node of the first brick, so in this case we will get the uuid of the
>>>>> three nodes because all of them are the first brick of a replica-set.
>>>>>
>>>>> Does this mean that with this configuration all nodes are active ? Is
>>>>> this a problem ? Is there any other check to avoid this situation if
>>>>> it's not good ?
>>>>>
>>>> Yes all Geo-rep workers will become Active and participate in syncing.
>>>> Since changelogs will have the same information in replica bricks this
>>>> will lead to duplicate syncing and consuming network bandwidth.
>>>>
>>>> Node-uuid based Active worker is the default configuration in Geo-rep
>>>> till now, Geo-rep also has Meta Volume based syncronization for Active
>>>> worker using lock files.(Can be opted using Geo-rep configuration,
>>>> with this config node-uuid will not be used)
>>>>
>>>> Kotresh proposed a solution to configure which worker to become
>>>> Active. This will give more control to Admin to choose Active workers,
>>>> This will become default configuration from 3.12
>>>> https://github.com/gluster/glusterfs/issues/244
>>>>
>>>> --
>>>> Aravinda
>>>>
>>>>
>>>>> Xavi
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>         Bricks:
>>>>>>
>>>>>>         <guid>
>>>>>>
>>>>>>         AFR/EC:
>>>>>>         <guid>(<guid>, <guid>)
>>>>>>
>>>>>>         DHT:
>>>>>>         <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>>>>>
>>>>>>         In this case, AFR and EC would return the same <guid> they
>>>>>>         returned before the patch, but between '(' and ')' they put
>>>>>> the
>>>>>>         full list of guid's of all nodes. The first <guid> can be used
>>>>>>         by geo-replication. The list after the first <guid> can be
>>>>>> used
>>>>>>         for rebalance.
>>>>>>
>>>>>>         Not sure if there's any user of node-uuid above DHT.
>>>>>>
>>>>>>         Xavi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 Xavi
>>>>>>
>>>>>>
>>>>>>                     On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>>>>>>                     <xhernandez at datalab.es
>>>>>>             <mailto:xhernandez at datalab.es>
>>>>>> <mailto:xhernandez at datalab.es
>>>>>>             <mailto:xhernandez at datalab.es>>
>>>>>>                     <mailto:xhernandez at datalab.es
>>>>>>             <mailto:xhernandez at datalab.es>
>>>>>> <mailto:xhernandez at datalab.es
>>>>>>             <mailto:xhernandez at datalab.es>>>>
>>>>>>                     wrote:
>>>>>>
>>>>>>                         Hi Pranith,
>>>>>>
>>>>>>                         On 20/06/17 07:53, Pranith Kumar Karampuri
>>>>>> wrote:
>>>>>>
>>>>>>                             hi Xavi,
>>>>>>                                    We all made the mistake of not
>>>>>>             sending about changing
>>>>>>                             behavior of
>>>>>>                             node-uuid xattr so that rebalance can use
>>>>>>             multiple nodes
>>>>>>                     for doing
>>>>>>                             rebalance. Because of this on geo-rep all
>>>>>>             the workers
>>>>>>                     are becoming
>>>>>>                             active instead of one per EC/AFR
>>>>>> subvolume.
>>>>>>             So we are
>>>>>>                             frantically trying
>>>>>>                             to restore the functionality of node-uuid
>>>>>>             and introduce
>>>>>>                     a new
>>>>>>                             xattr for
>>>>>>                             the new behavior. Sunil will be sending
>>>>>> out
>>>>>>             a patch for
>>>>>>                     this.
>>>>>>
>>>>>>
>>>>>>                         Wouldn't it be better to change geo-rep
>>>>>> behavior
>>>>>>             to use the
>>>>>>                     new data
>>>>>>                         ? I think it's better as it's now, since it
>>>>>>             gives more
>>>>>>                     information
>>>>>>                         to upper layers so that they can take more
>>>>>>             accurate decisions.
>>>>>>
>>>>>>                         Xavi
>>>>>>
>>>>>>
>>>>>>                             --
>>>>>>                             Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                     --
>>>>>>                     Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             --
>>>>>>             Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Pranith
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170621/8e06cb3c/attachment-0001.html>