[Gluster-devel] geo-rep regression because of node-uuid change

Tue Jun 20 09:36:52 UTC 2017

Hi Xavi,

On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
> Hi Aravinda,
>
> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>> Adding more people to get a consensus about this.
>>
>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
>> <mailto:avishwan at redhat.com>> wrote:
>>
>>
>>     regards
>>     Aravinda VK
>>
>>
>>     On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>>
>>         Hi Pranith,
>>
>>         adding gluster-devel, Kotresh and Aravinda,
>>
>>         On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>
>>
>>
>>             On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>             <mailto:xhernandez at datalab.es
>>             <mailto:xhernandez at datalab.es>>> wrote:
>>
>>                 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>
>>                     The way geo-replication works is:
>>                     On each machine, it does getxattr of node-uuid and
>>             check if its
>>                     own uuid
>>                     is present in the list. If it is present then it
>>             will consider
>>                     it active
>>                     otherwise it will be considered passive. With this
>>             change we are
>>                     giving
>>                     all uuids instead of first-up subvolume. So all
>>             machines think
>>                     they are
>>                     ACTIVE which is bad apparently. So that is the
>>             reason. Even I
>>                     felt bad
>>                     that we are doing this change.
>>
>>
>>                 And what about changing the content of node-uuid to
>>             include some
>>                 sort of hierarchy ?
>>
>>                 for example:
>>
>>                 a single brick:
>>
>>                 NODE(<guid>)
>>
>>                 AFR/EC:
>>
>>                 AFR[2](NODE(<guid>), NODE(<guid>))
>>                 EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>
>>                 DHT:
>>
>>                 DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>>             AFR[2](NODE(<guid>),
>>                 NODE(<guid>)))
>>
>>                 This gives a lot of information that can be used to 
>> take the
>>                 appropriate decisions.
>>
>>
>>             I guess that is not backward compatible. Shall I CC
>>             gluster-devel and
>>             Kotresh/Aravinda?
>>
>>
>>         Is the change we did backward compatible ? if we only require
>>         the first field to be a GUID to support backward compatibility,
>>         we can use something like this:
>>
>>     No. But the necessary change can be made to Geo-rep code as well if
>>     format is changed, Since all these are built/shipped together.
>>
>>     Geo-rep uses node-id as follows,
>>
>>     list = listxattr(node-uuid)
>>     active_node_uuids = list.split(SPACE)
>>     active_node_flag = True if self.node_id exists in active_node_uuids
>>     else False
>
> How was this case solved ?
>
> suppose we have three servers and 2 bricks in each server. A 
> replicated volume is created using the following command:
>
> gluster volume create test replica 2 server1:/brick1 server2:/brick1 
> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>
> In this case we have three replica-sets:
>
> * server1:/brick1 server2:/brick1
> * server2:/brick2 server3:/brick1
> * server3:/brick2 server2:/brick2
>
> Old AFR implementation for node-uuid always returned the uuid of the 
> node of the first brick, so in this case we will get the uuid of the 
> three nodes because all of them are the first brick of a replica-set.
>
> Does this mean that with this configuration all nodes are active ? Is 
> this a problem ? Is there any other check to avoid this situation if 
> it's not good ?
Yes all Geo-rep workers will become Active and participate in syncing. 
Since changelogs will have the same information in replica bricks this 
will lead to duplicate syncing and consuming network bandwidth.

Node-uuid based Active worker is the default configuration in Geo-rep 
till now, Geo-rep also has Meta Volume based syncronization for Active 
worker using lock files.(Can be opted using Geo-rep configuration, with 
this config node-uuid will not be used)

Kotresh proposed a solution to configure which worker to become Active. 
This will give more control to Admin to choose Active workers, This will 
become default configuration from 3.12
https://github.com/gluster/glusterfs/issues/244

--
Aravinda

>
> Xavi
>
>>
>>
>>
>>         Bricks:
>>
>>         <guid>
>>
>>         AFR/EC:
>>         <guid>(<guid>, <guid>)
>>
>>         DHT:
>>         <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>
>>         In this case, AFR and EC would return the same <guid> they
>>         returned before the patch, but between '(' and ')' they put the
>>         full list of guid's of all nodes. The first <guid> can be used
>>         by geo-replication. The list after the first <guid> can be used
>>         for rebalance.
>>
>>         Not sure if there's any user of node-uuid above DHT.
>>
>>         Xavi
>>
>>
>>
>>
>>                 Xavi
>>
>>
>>                     On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>>                     <xhernandez at datalab.es
>>             <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>>             <mailto:xhernandez at datalab.es>>
>>                     <mailto:xhernandez at datalab.es
>>             <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>>             <mailto:xhernandez at datalab.es>>>>
>>                     wrote:
>>
>>                         Hi Pranith,
>>
>>                         On 20/06/17 07:53, Pranith Kumar Karampuri 
>> wrote:
>>
>>                             hi Xavi,
>>                                    We all made the mistake of not
>>             sending about changing
>>                             behavior of
>>                             node-uuid xattr so that rebalance can use
>>             multiple nodes
>>                     for doing
>>                             rebalance. Because of this on geo-rep all
>>             the workers
>>                     are becoming
>>                             active instead of one per EC/AFR subvolume.
>>             So we are
>>                             frantically trying
>>                             to restore the functionality of node-uuid
>>             and introduce
>>                     a new
>>                             xattr for
>>                             the new behavior. Sunil will be sending out
>>             a patch for
>>                     this.
>>
>>
>>                         Wouldn't it be better to change geo-rep behavior
>>             to use the
>>                     new data
>>                         ? I think it's better as it's now, since it
>>             gives more
>>                     information
>>                         to upper layers so that they can take more
>>             accurate decisions.
>>
>>                         Xavi
>>
>>
>>                             --
>>                             Pranith
>>
>>
>>
>>
>>
>>                     --
>>                     Pranith
>>
>>
>>
>>
>>
>>             --
>>             Pranith
>>
>>
>>
>>
>>
>>
>> -- 
>> Pranith
>