[Gluster-devel] geo-rep regression because of node-uuid change

Pranith Kumar Karampuri pkarampu at redhat.com
Tue Jun 20 09:05:42 UTC 2017


Adding more people to get a consensus about this.

On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com> wrote:

>
> regards
> Aravinda VK
>
>
> On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>
>> Hi Pranith,
>>
>> adding gluster-devel, Kotresh and Aravinda,
>>
>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>
>>>
>>>
>>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez <xhernandez at datalab.es
>>> <mailto:xhernandez at datalab.es>> wrote:
>>>
>>>     On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>>
>>>         The way geo-replication works is:
>>>         On each machine, it does getxattr of node-uuid and check if its
>>>         own uuid
>>>         is present in the list. If it is present then it will consider
>>>         it active
>>>         otherwise it will be considered passive. With this change we are
>>>         giving
>>>         all uuids instead of first-up subvolume. So all machines think
>>>         they are
>>>         ACTIVE which is bad apparently. So that is the reason. Even I
>>>         felt bad
>>>         that we are doing this change.
>>>
>>>
>>>     And what about changing the content of node-uuid to include some
>>>     sort of hierarchy ?
>>>
>>>     for example:
>>>
>>>     a single brick:
>>>
>>>     NODE(<guid>)
>>>
>>>     AFR/EC:
>>>
>>>     AFR[2](NODE(<guid>), NODE(<guid>))
>>>     EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>>
>>>     DHT:
>>>
>>>     DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)), AFR[2](NODE(<guid>),
>>>     NODE(<guid>)))
>>>
>>>     This gives a lot of information that can be used to take the
>>>     appropriate decisions.
>>>
>>>
>>> I guess that is not backward compatible. Shall I CC gluster-devel and
>>> Kotresh/Aravinda?
>>>
>>
>> Is the change we did backward compatible ? if we only require the first
>> field to be a GUID to support backward compatibility, we can use something
>> like this:
>>
> No. But the necessary change can be made to Geo-rep code as well if format
> is changed, Since all these are built/shipped together.
>
> Geo-rep uses node-id as follows,
>
> list = listxattr(node-uuid)
> active_node_uuids = list.split(SPACE)
> active_node_flag = True if self.node_id exists in active_node_uuids else
> False
>
>
>
>> Bricks:
>>
>> <guid>
>>
>> AFR/EC:
>> <guid>(<guid>, <guid>)
>>
>> DHT:
>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>
>> In this case, AFR and EC would return the same <guid> they returned
>> before the patch, but between '(' and ')' they put the full list of guid's
>> of all nodes. The first <guid> can be used by geo-replication. The list
>> after the first <guid> can be used for rebalance.
>>
>> Not sure if there's any user of node-uuid above DHT.
>>
>> Xavi
>>
>>
>>>
>>>
>>>     Xavi
>>>
>>>
>>>         On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>>>         <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>>         <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>>
>>>         wrote:
>>>
>>>             Hi Pranith,
>>>
>>>             On 20/06/17 07:53, Pranith Kumar Karampuri wrote:
>>>
>>>                 hi Xavi,
>>>                        We all made the mistake of not sending about
>>> changing
>>>                 behavior of
>>>                 node-uuid xattr so that rebalance can use multiple nodes
>>>         for doing
>>>                 rebalance. Because of this on geo-rep all the workers
>>>         are becoming
>>>                 active instead of one per EC/AFR subvolume. So we are
>>>                 frantically trying
>>>                 to restore the functionality of node-uuid and introduce
>>>         a new
>>>                 xattr for
>>>                 the new behavior. Sunil will be sending out a patch for
>>>         this.
>>>
>>>
>>>             Wouldn't it be better to change geo-rep behavior to use the
>>>         new data
>>>             ? I think it's better as it's now, since it gives more
>>>         information
>>>             to upper layers so that they can take more accurate
>>> decisions.
>>>
>>>             Xavi
>>>
>>>
>>>                 --
>>>                 Pranith
>>>
>>>
>>>
>>>
>>>
>>>         --
>>>         Pranith
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170620/0c0f505d/attachment-0001.html>


More information about the Gluster-devel mailing list