[Gluster-devel] geo-rep regression because of node-uuid change
Aravinda
avishwan at redhat.com
Tue Jun 20 09:36:52 UTC 2017
Hi Xavi,
On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
> Hi Aravinda,
>
> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>> Adding more people to get a consensus about this.
>>
>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avishwan at redhat.com
>> <mailto:avishwan at redhat.com>> wrote:
>>
>>
>> regards
>> Aravinda VK
>>
>>
>> On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>>
>> Hi Pranith,
>>
>> adding gluster-devel, Kotresh and Aravinda,
>>
>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>> wrote:
>>
>> On 20/06/17 09:31, Pranith Kumar Karampuri wrote:
>>
>> The way geo-replication works is:
>> On each machine, it does getxattr of node-uuid and
>> check if its
>> own uuid
>> is present in the list. If it is present then it
>> will consider
>> it active
>> otherwise it will be considered passive. With this
>> change we are
>> giving
>> all uuids instead of first-up subvolume. So all
>> machines think
>> they are
>> ACTIVE which is bad apparently. So that is the
>> reason. Even I
>> felt bad
>> that we are doing this change.
>>
>>
>> And what about changing the content of node-uuid to
>> include some
>> sort of hierarchy ?
>>
>> for example:
>>
>> a single brick:
>>
>> NODE(<guid>)
>>
>> AFR/EC:
>>
>> AFR[2](NODE(<guid>), NODE(<guid>))
>> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>>
>> DHT:
>>
>> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
>> AFR[2](NODE(<guid>),
>> NODE(<guid>)))
>>
>> This gives a lot of information that can be used to
>> take the
>> appropriate decisions.
>>
>>
>> I guess that is not backward compatible. Shall I CC
>> gluster-devel and
>> Kotresh/Aravinda?
>>
>>
>> Is the change we did backward compatible ? if we only require
>> the first field to be a GUID to support backward compatibility,
>> we can use something like this:
>>
>> No. But the necessary change can be made to Geo-rep code as well if
>> format is changed, Since all these are built/shipped together.
>>
>> Geo-rep uses node-id as follows,
>>
>> list = listxattr(node-uuid)
>> active_node_uuids = list.split(SPACE)
>> active_node_flag = True if self.node_id exists in active_node_uuids
>> else False
>
> How was this case solved ?
>
> suppose we have three servers and 2 bricks in each server. A
> replicated volume is created using the following command:
>
> gluster volume create test replica 2 server1:/brick1 server2:/brick1
> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>
> In this case we have three replica-sets:
>
> * server1:/brick1 server2:/brick1
> * server2:/brick2 server3:/brick1
> * server3:/brick2 server2:/brick2
>
> Old AFR implementation for node-uuid always returned the uuid of the
> node of the first brick, so in this case we will get the uuid of the
> three nodes because all of them are the first brick of a replica-set.
>
> Does this mean that with this configuration all nodes are active ? Is
> this a problem ? Is there any other check to avoid this situation if
> it's not good ?
Yes all Geo-rep workers will become Active and participate in syncing.
Since changelogs will have the same information in replica bricks this
will lead to duplicate syncing and consuming network bandwidth.
Node-uuid based Active worker is the default configuration in Geo-rep
till now, Geo-rep also has Meta Volume based syncronization for Active
worker using lock files.(Can be opted using Geo-rep configuration, with
this config node-uuid will not be used)
Kotresh proposed a solution to configure which worker to become Active.
This will give more control to Admin to choose Active workers, This will
become default configuration from 3.12
https://github.com/gluster/glusterfs/issues/244
--
Aravinda
>
> Xavi
>
>>
>>
>>
>> Bricks:
>>
>> <guid>
>>
>> AFR/EC:
>> <guid>(<guid>, <guid>)
>>
>> DHT:
>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>>
>> In this case, AFR and EC would return the same <guid> they
>> returned before the patch, but between '(' and ')' they put the
>> full list of guid's of all nodes. The first <guid> can be used
>> by geo-replication. The list after the first <guid> can be used
>> for rebalance.
>>
>> Not sure if there's any user of node-uuid above DHT.
>>
>> Xavi
>>
>>
>>
>>
>> Xavi
>>
>>
>> On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez
>> <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>>
>> wrote:
>>
>> Hi Pranith,
>>
>> On 20/06/17 07:53, Pranith Kumar Karampuri
>> wrote:
>>
>> hi Xavi,
>> We all made the mistake of not
>> sending about changing
>> behavior of
>> node-uuid xattr so that rebalance can use
>> multiple nodes
>> for doing
>> rebalance. Because of this on geo-rep all
>> the workers
>> are becoming
>> active instead of one per EC/AFR subvolume.
>> So we are
>> frantically trying
>> to restore the functionality of node-uuid
>> and introduce
>> a new
>> xattr for
>> the new behavior. Sunil will be sending out
>> a patch for
>> this.
>>
>>
>> Wouldn't it be better to change geo-rep behavior
>> to use the
>> new data
>> ? I think it's better as it's now, since it
>> gives more
>> information
>> to upper layers so that they can take more
>> accurate decisions.
>>
>> Xavi
>>
>>
>> --
>> Pranith
>>
>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>>
>>
>>
>> --
>> Pranith
>
More information about the Gluster-devel
mailing list