[Gluster-devel] geo-rep regression because of node-uuid change

Tue Jun 20 09:21:03 UTC 2017

Hi Aravinda,

On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
> Adding more people to get a consensus about this.
>
> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
> <mailto:avishwan at redhat.com>> wrote:
>
>
>     regards
>     Aravinda VK
>
>
>     On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>
>         Hi Pranith,
>
>         adding gluster-devel, Kotresh and Aravinda,
>
>         On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>
>
>
>             On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>             <mailto:xhernandez at datalab.es
>             <mailto:xhernandez at datalab.es>>> wrote:
>
>                 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>
>                     The way geo-replication works is:
>                     On each machine, it does getxattr of node-uuid and
>             check if its
>                     own uuid
>                     is present in the list. If it is present then it
>             will consider
>                     it active
>                     otherwise it will be considered passive. With this
>             change we are
>                     giving
>                     all uuids instead of first-up subvolume. So all
>             machines think
>                     they are
>                     ACTIVE which is bad apparently. So that is the
>             reason. Even I
>                     felt bad
>                     that we are doing this change.
>
>
>                 And what about changing the content of node-uuid to
>             include some
>                 sort of hierarchy ?
>
>                 for example:
>
>                 a single brick:
>
>                 NODE(<guid>)
>
>                 AFR/EC:
>
>                 AFR[2](NODE(<guid>), NODE(<guid>))
>                 EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>
>                 DHT:
>
>                 DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>             AFR[2](NODE(<guid>),
>                 NODE(<guid>)))
>
>                 This gives a lot of information that can be used to take the
>                 appropriate decisions.
>
>
>             I guess that is not backward compatible. Shall I CC
>             gluster-devel and
>             Kotresh/Aravinda?
>
>
>         Is the change we did backward compatible ? if we only require
>         the first field to be a GUID to support backward compatibility,
>         we can use something like this:
>
>     No. But the necessary change can be made to Geo-rep code as well if
>     format is changed, Since all these are built/shipped together.
>
>     Geo-rep uses node-id as follows,
>
>     list = listxattr(node-uuid)
>     active_node_uuids = list.split(SPACE)
>     active_node_flag = True if self.node_id exists in active_node_uuids
>     else False

How was this case solved ?

suppose we have three servers and 2 bricks in each server. A replicated 
volume is created using the following command:

gluster volume create test replica 2 server1:/brick1 server2:/brick1 
server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2

In this case we have three replica-sets:

* server1:/brick1 server2:/brick1
* server2:/brick2 server3:/brick1
* server3:/brick2 server2:/brick2

Old AFR implementation for node-uuid always returned the uuid of the 
node of the first brick, so in this case we will get the uuid of the 
three nodes because all of them are the first brick of a replica-set.

Does this mean that with this configuration all nodes are active ? Is 
this a problem ? Is there any other check to avoid this situation if 
it's not good ?

Xavi

>
>
>
>         Bricks:
>
>         <guid>
>
>         AFR/EC:
>         <guid>(<guid>, <guid>)
>
>         DHT:
>         <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>
>         In this case, AFR and EC would return the same <guid> they
>         returned before the patch, but between '(' and ')' they put the
>         full list of guid's of all nodes. The first <guid> can be used
>         by geo-replication. The list after the first <guid> can be used
>         for rebalance.
>
>         Not sure if there's any user of node-uuid above DHT.
>
>         Xavi
>
>
>
>
>                 Xavi
>
>
>                     On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>                     <xhernandez at datalab.es
>             <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>             <mailto:xhernandez at datalab.es>>
>                     <mailto:xhernandez at datalab.es
>             <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>             <mailto:xhernandez at datalab.es>>>>
>                     wrote:
>
>                         Hi Pranith,
>
>                         On 20/06/17 07:53, Pranith Kumar Karampuri wrote:
>
>                             hi Xavi,
>                                    We all made the mistake of not
>             sending about changing
>                             behavior of
>                             node-uuid xattr so that rebalance can use
>             multiple nodes
>                     for doing
>                             rebalance. Because of this on geo-rep all
>             the workers
>                     are becoming
>                             active instead of one per EC/AFR subvolume.
>             So we are
>                             frantically trying
>                             to restore the functionality of node-uuid
>             and introduce
>                     a new
>                             xattr for
>                             the new behavior. Sunil will be sending out
>             a patch for
>                     this.
>
>
>                         Wouldn't it be better to change geo-rep behavior
>             to use the
>                     new data
>                         ? I think it's better as it's now, since it
>             gives more
>                     information
>                         to upper layers so that they can take more
>             accurate decisions.
>
>                         Xavi
>
>
>                             --
>                             Pranith
>
>
>
>
>
>                     --
>                     Pranith
>
>
>
>
>
>             --
>             Pranith
>
>
>
>
>
>
> --
> Pranith