[Gluster-devel] geo-rep regression because of node-uuid change
Aravinda
avishwan at redhat.com
Tue Jun 20 15:08:58 UTC 2017
On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:
> Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to
> go with the format Aravinda suggested for now and in future we wanted
> some more changes for dht to detect which subvolume went down came
> back up, at that time we will revisit the solution suggested by Xavi.
>
> Susanth is doing the dht changes
> Aravinda is doing geo-rep changes
Done. Geo-rep patch sent for review https://review.gluster.org/17582
--
Aravinda
>
> Thanks to all of you guys for the discussions!
>
> On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>
> Hi Aravinda,
>
> On 20/06/17 12:42, Aravinda wrote:
>
> I think following format can be easily adopted by all components
>
> UUIDs of a subvolume are seperated by space and subvolumes are
> separated
> by comma
>
> For example, node1 and node2 are replica with U1 and U2 UUIDs
> respectively and
> node3 and node4 are replica with U3 and U4 UUIDs respectively
>
> node-uuid can return "U1 U2,U3 U4"
>
>
> While this is ok for current implementation, I think this can be
> insufficient if there are more layers of xlators that require to
> indicate some sort of grouping. Some representation that can
> represent hierarchy would be better. For example: "(U1 U2) (U3
> U4)" (we can use spaces or comma as a separator).
>
>
> Geo-rep can split by "," and then split by space and take
> first UUID
> DHT can split the value by space or comma and get unique UUIDs
> list
>
>
> This doesn't solve the problem I described in the previous email.
> Some more logic will need to be added to avoid more than one node
> from each replica-set to be active. If we have some explicit
> hierarchy information in the node-uuid value, more decisions can
> be taken.
>
> An initial proposal I made was this:
>
> DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2)))
>
> This is harder to parse, but gives a lot of information: DHT with
> 2 subvolumes, each subvolume is an AFR with replica 2 and no
> arbiters. It's also easily extensible with any new xlator that
> changes the layout.
>
> However maybe this is not the moment to do this, and probably we
> could implement this in a new xattr with a better name.
>
> Xavi
>
>
>
> Another question is about the behavior when a node is down,
> existing
> node-uuid xattr will not return that UUID if a node is down.
> What is the
> behavior with the proposed xattr?
>
> Let me know your thoughts.
>
> regards
> Aravinda VK
>
> On 06/20/2017 03:06 PM, Aravinda wrote:
>
> Hi Xavi,
>
> On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
>
> Hi Aravinda,
>
> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>
> Adding more people to get a consensus about this.
>
> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda
> <avishwan at redhat.com <mailto:avishwan at redhat.com>
> <mailto:avishwan at redhat.com
> <mailto:avishwan at redhat.com>>> wrote:
>
>
> regards
> Aravinda VK
>
>
> On 06/20/2017 01:26 PM, Xavier Hernandez wrote:
>
> Hi Pranith,
>
> adding gluster-devel, Kotresh and Aravinda,
>
> On 20/06/17 09:45, Pranith Kumar Karampuri
> wrote:
>
>
>
> On Tue, Jun 20, 2017 at 1:12 PM,
> Xavier Hernandez
> <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>>> wrote:
>
> On 20/06/17 09:31, Pranith Kumar
> Karampuri wrote:
>
> The way geo-replication works is:
> On each machine, it does
> getxattr of node-uuid and
> check if its
> own uuid
> is present in the list. If it
> is present then it
> will consider
> it active
> otherwise it will be
> considered passive. With this
> change we are
> giving
> all uuids instead of first-up
> subvolume. So all
> machines think
> they are
> ACTIVE which is bad
> apparently. So that is the
> reason. Even I
> felt bad
> that we are doing this change.
>
>
> And what about changing the
> content of node-uuid to
> include some
> sort of hierarchy ?
>
> for example:
>
> a single brick:
>
> NODE(<guid>)
>
> AFR/EC:
>
> AFR[2](NODE(<guid>), NODE(<guid>))
> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>))
>
> DHT:
>
> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)),
> AFR[2](NODE(<guid>),
> NODE(<guid>)))
>
> This gives a lot of information
> that can be used to
> take the
> appropriate decisions.
>
>
> I guess that is not backward
> compatible. Shall I CC
> gluster-devel and
> Kotresh/Aravinda?
>
>
> Is the change we did backward compatible ?
> if we only require
> the first field to be a GUID to support
> backward compatibility,
> we can use something like this:
>
> No. But the necessary change can be made to
> Geo-rep code as well if
> format is changed, Since all these are
> built/shipped together.
>
> Geo-rep uses node-id as follows,
>
> list = listxattr(node-uuid)
> active_node_uuids = list.split(SPACE)
> active_node_flag = True if self.node_id exists
> in active_node_uuids
> else False
>
>
> How was this case solved ?
>
> suppose we have three servers and 2 bricks in each
> server. A
> replicated volume is created using the following command:
>
> gluster volume create test replica 2 server1:/brick1
> server2:/brick1
> server2:/brick2 server3:/brick1 server3:/brick1
> server1:/brick2
>
> In this case we have three replica-sets:
>
> * server1:/brick1 server2:/brick1
> * server2:/brick2 server3:/brick1
> * server3:/brick2 server2:/brick2
>
> Old AFR implementation for node-uuid always returned
> the uuid of the
> node of the first brick, so in this case we will get
> the uuid of the
> three nodes because all of them are the first brick of
> a replica-set.
>
> Does this mean that with this configuration all nodes
> are active ? Is
> this a problem ? Is there any other check to avoid
> this situation if
> it's not good ?
>
> Yes all Geo-rep workers will become Active and participate
> in syncing.
> Since changelogs will have the same information in replica
> bricks this
> will lead to duplicate syncing and consuming network
> bandwidth.
>
> Node-uuid based Active worker is the default configuration
> in Geo-rep
> till now, Geo-rep also has Meta Volume based
> syncronization for Active
> worker using lock files.(Can be opted using Geo-rep
> configuration,
> with this config node-uuid will not be used)
>
> Kotresh proposed a solution to configure which worker to
> become
> Active. This will give more control to Admin to choose
> Active workers,
> This will become default configuration from 3.12
> https://github.com/gluster/glusterfs/issues/244
> <https://github.com/gluster/glusterfs/issues/244>
>
> --
> Aravinda
>
>
> Xavi
>
>
>
>
> Bricks:
>
> <guid>
>
> AFR/EC:
> <guid>(<guid>, <guid>)
>
> DHT:
> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...))
>
> In this case, AFR and EC would return the
> same <guid> they
> returned before the patch, but between '('
> and ')' they put the
> full list of guid's of all nodes. The
> first <guid> can be used
> by geo-replication. The list after the
> first <guid> can be used
> for rebalance.
>
> Not sure if there's any user of node-uuid
> above DHT.
>
> Xavi
>
>
>
>
> Xavi
>
>
> On Tue, Jun 20, 2017 at 12:46
> PM, Xavier Hernandez
> <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>>>>
> wrote:
>
> Hi Pranith,
>
> On 20/06/17 07:53, Pranith
> Kumar Karampuri
> wrote:
>
> hi Xavi,
> We all made the
> mistake of not
> sending about changing
> behavior of
> node-uuid xattr so
> that rebalance can use
> multiple nodes
> for doing
> rebalance. Because of
> this on geo-rep all
> the workers
> are becoming
> active instead of one
> per EC/AFR subvolume.
> So we are
> frantically trying
> to restore the
> functionality of node-uuid
> and introduce
> a new
> xattr for
> the new behavior.
> Sunil will be sending out
> a patch for
> this.
>
>
> Wouldn't it be better to
> change geo-rep
> behavior
> to use the
> new data
> ? I think it's better as
> it's now, since it
> gives more
> information
> to upper layers so that
> they can take more
> accurate decisions.
>
> Xavi
>
>
> --
> Pranith
>
>
>
>
>
> --
> Pranith
>
>
>
>
>
> --
> Pranith
>
>
>
>
>
>
> --
> Pranith
>
>
>
>
>
>
>
>
> --
> Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170620/88ebf30f/attachment-0001.html>
More information about the Gluster-devel
mailing list