[Gluster-devel] geo-rep regression because of node-uuid change

Tue Jun 20 11:26:13 UTC 2017

EC also sends all zeros if the node is down.

Regards,

Sunil kumar Acharya

Senior Software Engineer

Red Hat

<https://www.redhat.com>

T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>

<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
On Tue, Jun 20, 2017 at 4:27 PM, Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:

>
>
> On Tue, Jun 20, 2017 at 4:12 PM, Aravinda <avishwan at redhat.com> wrote:
>
>> I think following format can be easily adopted by all components
>>
>> UUIDs of a subvolume are seperated by space and subvolumes are separated
>> by comma
>>
>> For example, node1 and node2 are replica with U1 and U2 UUIDs
>> respectively and
>> node3 and node4 are replica with U3 and U4 UUIDs respectively
>>
>> node-uuid can return "U1 U2,U3 U4"
>>
>> Geo-rep can split by "," and then split by space and take first UUID
>> DHT can split the value by space or comma and get unique UUIDs list
>>
>> Another question is about the behavior when a node is down, existing
>> node-uuid xattr will not return that UUID if a node is down.
>
> After the change [1], if a node is down we send all zeros as the uuid for
> that node, in the list of node uuids.
>
> [1] https://review.gluster.org/#/c/17084/
>
> Regards,
> Karthik
>
>> What is the behavior with the proposed xattr?
>>
>> Let me know your thoughts.
>>
>> regards
>> Aravinda VK
>>
>>
>> On 06/20/2017 03:06 PM, Aravinda wrote:
>>
>>> Hi Xavi,
>>>
>>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
>>>
>>>> Hi Aravinda,
>>>>
>>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>>>>
>>>>> Adding more people to get a consensus about this.
>>>>>
>>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
>>>>> <mailto:avishwan at redhat.com>> wrote:
>>>>>
>>>>>
>>>>>     regards
>>>>>     Aravinda VK
>>>>>
>>>>>
>>>>>     On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>>>>>
>>>>>         Hi Pranith,
>>>>>
>>>>>         adding gluster-devel, Kotresh and Aravinda,
>>>>>
>>>>>         On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>
>>>>>             On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>>>>>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>>>>             <mailto:xhernandez at datalab.es
>>>>>             <mailto:xhernandez at datalab.es>>> wrote:
>>>>>
>>>>>                 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>                     The way geo-replication works is:
>>>>>                     On each machine, it does getxattr of node-uuid and
>>>>>             check if its
>>>>>                     own uuid
>>>>>                     is present in the list. If it is present then it
>>>>>             will consider
>>>>>                     it active
>>>>>                     otherwise it will be considered passive. With this
>>>>>             change we are
>>>>>                     giving
>>>>>                     all uuids instead of first-up subvolume. So all
>>>>>             machines think
>>>>>                     they are
>>>>>                     ACTIVE which is bad apparently. So that is the
>>>>>             reason. Even I
>>>>>                     felt bad
>>>>>                     that we are doing this change.
>>>>>
>>>>>
>>>>>                 And what about changing the content of node-uuid to
>>>>>             include some
>>>>>                 sort of hierarchy ?
>>>>>
>>>>>                 for example:
>>>>>
>>>>>                 a single brick:
>>>>>
>>>>>                 NODE(<guid>)
>>>>>
>>>>>                 AFR/EC:
>>>>>
>>>>>                 AFR[2](NODE(<guid>), NODE(<guid>))
>>>>>                 EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>>>>
>>>>>                 DHT:
>>>>>
>>>>>                 DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>>>>>             AFR[2](NODE(<guid>),
>>>>>                 NODE(<guid>)))
>>>>>
>>>>>                 This gives a lot of information that can be used to
>>>>> take the
>>>>>                 appropriate decisions.
>>>>>
>>>>>
>>>>>             I guess that is not backward compatible. Shall I CC
>>>>>             gluster-devel and
>>>>>             Kotresh/Aravinda?
>>>>>
>>>>>
>>>>>         Is the change we did backward compatible ? if we only require
>>>>>         the first field to be a GUID to support backward compatibility,
>>>>>         we can use something like this:
>>>>>
>>>>>     No. But the necessary change can be made to Geo-rep code as well if
>>>>>     format is changed, Since all these are built/shipped together.
>>>>>
>>>>>     Geo-rep uses node-id as follows,
>>>>>
>>>>>     list = listxattr(node-uuid)
>>>>>     active_node_uuids = list.split(SPACE)
>>>>>     active_node_flag = True if self.node_id exists in active_node_uuids
>>>>>     else False
>>>>>
>>>>
>>>> How was this case solved ?
>>>>
>>>> suppose we have three servers and 2 bricks in each server. A replicated
>>>> volume is created using the following command:
>>>>
>>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1
>>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>>>>
>>>> In this case we have three replica-sets:
>>>>
>>>> * server1:/brick1 server2:/brick1
>>>> * server2:/brick2 server3:/brick1
>>>> * server3:/brick2 server2:/brick2
>>>>
>>>> Old AFR implementation for node-uuid always returned the uuid of the
>>>> node of the first brick, so in this case we will get the uuid of the three
>>>> nodes because all of them are the first brick of a replica-set.
>>>>
>>>> Does this mean that with this configuration all nodes are active ? Is
>>>> this a problem ? Is there any other check to avoid this situation if it's
>>>> not good ?
>>>>
>>> Yes all Geo-rep workers will become Active and participate in syncing.
>>> Since changelogs will have the same information in replica bricks this will
>>> lead to duplicate syncing and consuming network bandwidth.
>>>
>>> Node-uuid based Active worker is the default configuration in Geo-rep
>>> till now, Geo-rep also has Meta Volume based syncronization for Active
>>> worker using lock files.(Can be opted using Geo-rep configuration, with
>>> this config node-uuid will not be used)
>>>
>>> Kotresh proposed a solution to configure which worker to become Active.
>>> This will give more control to Admin to choose Active workers, This will
>>> become default configuration from 3.12
>>> https://github.com/gluster/glusterfs/issues/244
>>>
>>> --
>>> Aravinda
>>>
>>>
>>>> Xavi
>>>>
>>>>
>>>>>
>>>>>
>>>>>         Bricks:
>>>>>
>>>>>         <guid>
>>>>>
>>>>>         AFR/EC:
>>>>>         <guid>(<guid>, <guid>)
>>>>>
>>>>>         DHT:
>>>>>         <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>>>>
>>>>>         In this case, AFR and EC would return the same <guid> they
>>>>>         returned before the patch, but between '(' and ')' they put the
>>>>>         full list of guid's of all nodes. The first <guid> can be used
>>>>>         by geo-replication. The list after the first <guid> can be used
>>>>>         for rebalance.
>>>>>
>>>>>         Not sure if there's any user of node-uuid above DHT.
>>>>>
>>>>>         Xavi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 Xavi
>>>>>
>>>>>
>>>>>                     On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>>>>>                     <xhernandez at datalab.es
>>>>>             <mailto:xhernandez at datalab.es> <mailto:
>>>>> xhernandez at datalab.es
>>>>>             <mailto:xhernandez at datalab.es>>
>>>>>                     <mailto:xhernandez at datalab.es
>>>>>             <mailto:xhernandez at datalab.es> <mailto:
>>>>> xhernandez at datalab.es
>>>>>             <mailto:xhernandez at datalab.es>>>>
>>>>>                     wrote:
>>>>>
>>>>>                         Hi Pranith,
>>>>>
>>>>>                         On 20/06/17 07:53, Pranith Kumar Karampuri
>>>>> wrote:
>>>>>
>>>>>                             hi Xavi,
>>>>>                                    We all made the mistake of not
>>>>>             sending about changing
>>>>>                             behavior of
>>>>>                             node-uuid xattr so that rebalance can use
>>>>>             multiple nodes
>>>>>                     for doing
>>>>>                             rebalance. Because of this on geo-rep all
>>>>>             the workers
>>>>>                     are becoming
>>>>>                             active instead of one per EC/AFR subvolume.
>>>>>             So we are
>>>>>                             frantically trying
>>>>>                             to restore the functionality of node-uuid
>>>>>             and introduce
>>>>>                     a new
>>>>>                             xattr for
>>>>>                             the new behavior. Sunil will be sending out
>>>>>             a patch for
>>>>>                     this.
>>>>>
>>>>>
>>>>>                         Wouldn't it be better to change geo-rep
>>>>> behavior
>>>>>             to use the
>>>>>                     new data
>>>>>                         ? I think it's better as it's now, since it
>>>>>             gives more
>>>>>                     information
>>>>>                         to upper layers so that they can take more
>>>>>             accurate decisions.
>>>>>
>>>>>                         Xavi
>>>>>
>>>>>
>>>>>                             --
>>>>>                             Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     --
>>>>>                     Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>             --
>>>>>             Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170620/82f2dfe7/attachment-0001.html>