[Gluster-devel] geo-rep regression because of node-uuid change

Tue Jun 20 10:57:17 UTC 2017

On Tue, Jun 20, 2017 at 4:12 PM, Aravinda <avishwan at redhat.com> wrote:

> I think following format can be easily adopted by all components
>
> UUIDs of a subvolume are seperated by space and subvolumes are separated
> by comma
>
> For example, node1 and node2 are replica with U1 and U2 UUIDs respectively
> and
> node3 and node4 are replica with U3 and U4 UUIDs respectively
>
> node-uuid can return "U1 U2,U3 U4"
>
> Geo-rep can split by "," and then split by space and take first UUID
> DHT can split the value by space or comma and get unique UUIDs list
>
> Another question is about the behavior when a node is down, existing
> node-uuid xattr will not return that UUID if a node is down.

After the change [1], if a node is down we send all zeros as the uuid for
that node, in the list of node uuids.

[1] https://review.gluster.org/#/c/17084/

Regards,
Karthik

> What is the behavior with the proposed xattr?
>
> Let me know your thoughts.
>
> regards
> Aravinda VK
>
>
> On 06/20/2017 03:06 PM, Aravinda wrote:
>
>> Hi Xavi,
>>
>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
>>
>>> Hi Aravinda,
>>>
>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>>>
>>>> Adding more people to get a consensus about this.
>>>>
>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
>>>> <mailto:avishwan at redhat.com>> wrote:
>>>>
>>>>
>>>>     regards
>>>>     Aravinda VK
>>>>
>>>>
>>>>     On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>>>>
>>>>         Hi Pranith,
>>>>
>>>>         adding gluster-devel, Kotresh and Aravinda,
>>>>
>>>>         On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>
>>>>             On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>>>>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>>>             <mailto:xhernandez at datalab.es
>>>>             <mailto:xhernandez at datalab.es>>> wrote:
>>>>
>>>>                 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>>>
>>>>                     The way geo-replication works is:
>>>>                     On each machine, it does getxattr of node-uuid and
>>>>             check if its
>>>>                     own uuid
>>>>                     is present in the list. If it is present then it
>>>>             will consider
>>>>                     it active
>>>>                     otherwise it will be considered passive. With this
>>>>             change we are
>>>>                     giving
>>>>                     all uuids instead of first-up subvolume. So all
>>>>             machines think
>>>>                     they are
>>>>                     ACTIVE which is bad apparently. So that is the
>>>>             reason. Even I
>>>>                     felt bad
>>>>                     that we are doing this change.
>>>>
>>>>
>>>>                 And what about changing the content of node-uuid to
>>>>             include some
>>>>                 sort of hierarchy ?
>>>>
>>>>                 for example:
>>>>
>>>>                 a single brick:
>>>>
>>>>                 NODE(<guid>)
>>>>
>>>>                 AFR/EC:
>>>>
>>>>                 AFR[2](NODE(<guid>), NODE(<guid>))
>>>>                 EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>>>
>>>>                 DHT:
>>>>
>>>>                 DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>>>>             AFR[2](NODE(<guid>),
>>>>                 NODE(<guid>)))
>>>>
>>>>                 This gives a lot of information that can be used to
>>>> take the
>>>>                 appropriate decisions.
>>>>
>>>>
>>>>             I guess that is not backward compatible. Shall I CC
>>>>             gluster-devel and
>>>>             Kotresh/Aravinda?
>>>>
>>>>
>>>>         Is the change we did backward compatible ? if we only require
>>>>         the first field to be a GUID to support backward compatibility,
>>>>         we can use something like this:
>>>>
>>>>     No. But the necessary change can be made to Geo-rep code as well if
>>>>     format is changed, Since all these are built/shipped together.
>>>>
>>>>     Geo-rep uses node-id as follows,
>>>>
>>>>     list = listxattr(node-uuid)
>>>>     active_node_uuids = list.split(SPACE)
>>>>     active_node_flag = True if self.node_id exists in active_node_uuids
>>>>     else False
>>>>
>>>
>>> How was this case solved ?
>>>
>>> suppose we have three servers and 2 bricks in each server. A replicated
>>> volume is created using the following command:
>>>
>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1
>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>>>
>>> In this case we have three replica-sets:
>>>
>>> * server1:/brick1 server2:/brick1
>>> * server2:/brick2 server3:/brick1
>>> * server3:/brick2 server2:/brick2
>>>
>>> Old AFR implementation for node-uuid always returned the uuid of the
>>> node of the first brick, so in this case we will get the uuid of the three
>>> nodes because all of them are the first brick of a replica-set.
>>>
>>> Does this mean that with this configuration all nodes are active ? Is
>>> this a problem ? Is there any other check to avoid this situation if it's
>>> not good ?
>>>
>> Yes all Geo-rep workers will become Active and participate in syncing.
>> Since changelogs will have the same information in replica bricks this will
>> lead to duplicate syncing and consuming network bandwidth.
>>
>> Node-uuid based Active worker is the default configuration in Geo-rep
>> till now, Geo-rep also has Meta Volume based syncronization for Active
>> worker using lock files.(Can be opted using Geo-rep configuration, with
>> this config node-uuid will not be used)
>>
>> Kotresh proposed a solution to configure which worker to become Active.
>> This will give more control to Admin to choose Active workers, This will
>> become default configuration from 3.12
>> https://github.com/gluster/glusterfs/issues/244
>>
>> --
>> Aravinda
>>
>>
>>> Xavi
>>>
>>>
>>>>
>>>>
>>>>         Bricks:
>>>>
>>>>         <guid>
>>>>
>>>>         AFR/EC:
>>>>         <guid>(<guid>, <guid>)
>>>>
>>>>         DHT:
>>>>         <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>>>
>>>>         In this case, AFR and EC would return the same <guid> they
>>>>         returned before the patch, but between '(' and ')' they put the
>>>>         full list of guid's of all nodes. The first <guid> can be used
>>>>         by geo-replication. The list after the first <guid> can be used
>>>>         for rebalance.
>>>>
>>>>         Not sure if there's any user of node-uuid above DHT.
>>>>
>>>>         Xavi
>>>>
>>>>
>>>>
>>>>
>>>>                 Xavi
>>>>
>>>>
>>>>                     On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>>>>                     <xhernandez at datalab.es
>>>>             <mailto:xhernandez at datalab.es> <mailto:
>>>> xhernandez at datalab.es
>>>>             <mailto:xhernandez at datalab.es>>
>>>>                     <mailto:xhernandez at datalab.es
>>>>             <mailto:xhernandez at datalab.es> <mailto:
>>>> xhernandez at datalab.es
>>>>             <mailto:xhernandez at datalab.es>>>>
>>>>                     wrote:
>>>>
>>>>                         Hi Pranith,
>>>>
>>>>                         On 20/06/17 07:53, Pranith Kumar Karampuri
>>>> wrote:
>>>>
>>>>                             hi Xavi,
>>>>                                    We all made the mistake of not
>>>>             sending about changing
>>>>                             behavior of
>>>>                             node-uuid xattr so that rebalance can use
>>>>             multiple nodes
>>>>                     for doing
>>>>                             rebalance. Because of this on geo-rep all
>>>>             the workers
>>>>                     are becoming
>>>>                             active instead of one per EC/AFR subvolume.
>>>>             So we are
>>>>                             frantically trying
>>>>                             to restore the functionality of node-uuid
>>>>             and introduce
>>>>                     a new
>>>>                             xattr for
>>>>                             the new behavior. Sunil will be sending out
>>>>             a patch for
>>>>                     this.
>>>>
>>>>
>>>>                         Wouldn't it be better to change geo-rep behavior
>>>>             to use the
>>>>                     new data
>>>>                         ? I think it's better as it's now, since it
>>>>             gives more
>>>>                     information
>>>>                         to upper layers so that they can take more
>>>>             accurate decisions.
>>>>
>>>>                         Xavi
>>>>
>>>>
>>>>                             --
>>>>                             Pranith
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                     --
>>>>                     Pranith
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>             --
>>>>             Pranith
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170620/a5b90d38/attachment-0001.html>