<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:<br>
<blockquote type="cite"
cite="mid:CAOgeEnb13Xfsq4vLqjNKgquGL6g9xk3TdkdW6+DKbgxZ1m9WTw@mail.gmail.com">
<div dir="ltr">
<div>
<div>
<div>Xavi, Aravinda and I had a discussion on #gluster-dev
and we agreed to go with the format Aravinda suggested for
now and in future we wanted some more changes for dht to
detect which subvolume went down came back up, at that
time we will revisit the solution suggested by Xavi.<br>
<br>
</div>
Susanth is doing the dht changes<br>
</div>
Aravinda is doing geo-rep changes<br>
</div>
</div>
</blockquote>
Done. Geo-rep patch sent for review <a class="moz-txt-link-freetext" href="https://review.gluster.org/17582">https://review.gluster.org/17582</a><br>
<br>
--<br>
Aravinda<br>
<br>
<blockquote type="cite"
cite="mid:CAOgeEnb13Xfsq4vLqjNKgquGL6g9xk3TdkdW6+DKbgxZ1m9WTw@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
Thanks to all of you guys for the discussions!<br>
<div>
<div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Jun 20, 2017 at 5:05
PM, Xavier Hernandez <span dir="ltr"><<a
href="mailto:xhernandez@datalab.es"
target="_blank" moz-do-not-send="true">xhernandez@datalab.es</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
Aravinda,<span class=""><br>
<br>
On 20/06/17 12:42, Aravinda wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
I think following format can be easily adopted
by all components<br>
<br>
UUIDs of a subvolume are seperated by space
and subvolumes are separated<br>
by comma<br>
<br>
For example, node1 and node2 are replica with
U1 and U2 UUIDs<br>
respectively and<br>
node3 and node4 are replica with U3 and U4
UUIDs respectively<br>
<br>
node-uuid can return "U1 U2,U3 U4"<br>
</blockquote>
<br>
</span>
While this is ok for current implementation, I
think this can be insufficient if there are more
layers of xlators that require to indicate some
sort of grouping. Some representation that can
represent hierarchy would be better. For example:
"(U1 U2) (U3 U4)" (we can use spaces or comma as a
separator).<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<br>
Geo-rep can split by "," and then split by
space and take first UUID<br>
DHT can split the value by space or comma and
get unique UUIDs list<br>
</blockquote>
<br>
</span>
This doesn't solve the problem I described in the
previous email. Some more logic will need to be
added to avoid more than one node from each
replica-set to be active. If we have some explicit
hierarchy information in the node-uuid value, more
decisions can be taken.<br>
<br>
An initial proposal I made was this:<br>
<br>
DHT[2](AFR[2,0](NODE(U1), NODE(U2)),
AFR[2,0](NODE(U1), NODE(U2)))<br>
<br>
This is harder to parse, but gives a lot of
information: DHT with 2 subvolumes, each subvolume
is an AFR with replica 2 and no arbiters. It's
also easily extensible with any new xlator that
changes the layout.<br>
<br>
However maybe this is not the moment to do this,
and probably we could implement this in a new
xattr with a better name.<br>
<br>
Xavi
<div class="HOEnZb">
<div class="h5"><br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<br>
Another question is about the behavior when
a node is down, existing<br>
node-uuid xattr will not return that UUID if
a node is down. What is the<br>
behavior with the proposed xattr?<br>
<br>
Let me know your thoughts.<br>
<br>
regards<br>
Aravinda VK<br>
<br>
On 06/20/2017 03:06 PM, Aravinda wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Hi Xavi,<br>
<br>
On 06/20/2017 02:51 PM, Xavier Hernandez
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
Hi Aravinda,<br>
<br>
On 20/06/17 11:05, Pranith Kumar
Karampuri wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Adding more people to get a consensus
about this.<br>
<br>
On Tue, Jun 20, 2017 at 1:49 PM,
Aravinda <<a
href="mailto:avishwan@redhat.com"
target="_blank"
moz-do-not-send="true">avishwan@redhat.com</a><br>
<mailto:<a
href="mailto:avishwan@redhat.com"
target="_blank"
moz-do-not-send="true">avishwan@redhat.com</a>>>
wrote:<br>
<br>
<br>
regards<br>
Aravinda VK<br>
<br>
<br>
On 06/20/2017 01:26 PM, Xavier
Hernandez wrote:<br>
<br>
Hi Pranith,<br>
<br>
adding gluster-devel, Kotresh
and Aravinda,<br>
<br>
On 20/06/17 09:45, Pranith
Kumar Karampuri wrote:<br>
<br>
<br>
<br>
On Tue, Jun 20, 2017 at
1:12 PM, Xavier Hernandez<br>
<<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><wbr>>>
wrote:<br>
<br>
On 20/06/17 09:31,
Pranith Kumar Karampuri wrote:<br>
<br>
The way
geo-replication works is:<br>
On each machine,
it does getxattr of node-uuid and<br>
check if its<br>
own uuid<br>
is present in the
list. If it is present then it<br>
will consider<br>
it active<br>
otherwise it will
be considered passive. With this<br>
change we are<br>
giving<br>
all uuids instead
of first-up subvolume. So all<br>
machines think<br>
they are<br>
ACTIVE which is
bad apparently. So that is the<br>
reason. Even I<br>
felt bad<br>
that we are doing
this change.<br>
<br>
<br>
And what about
changing the content of node-uuid to<br>
include some<br>
sort of hierarchy ?<br>
<br>
for example:<br>
<br>
a single brick:<br>
<br>
NODE(<guid>)<br>
<br>
AFR/EC:<br>
<br>
AFR[2](NODE(<guid>),
NODE(<guid>))<br>
EC[3,1](NODE(<guid>),
NODE(<guid>),
NODE(<guid>))<br>
<br>
DHT:<br>
<br>
DHT[2](AFR[2](NODE(<guid>),
NODE(<guid>)),<br>
AFR[2](NODE(<guid>),<br>
NODE(<guid>)))<br>
<br>
This gives a lot of
information that can be used to<br>
take the<br>
appropriate decisions.<br>
<br>
<br>
I guess that is not
backward compatible. Shall I CC<br>
gluster-devel and<br>
Kotresh/Aravinda?<br>
<br>
<br>
Is the change we did backward
compatible ? if we only require<br>
the first field to be a GUID
to support backward compatibility,<br>
we can use something like
this:<br>
<br>
No. But the necessary change can
be made to Geo-rep code as well if<br>
format is changed, Since all these
are built/shipped together.<br>
<br>
Geo-rep uses node-id as follows,<br>
<br>
list = listxattr(node-uuid)<br>
active_node_uuids =
list.split(SPACE)<br>
active_node_flag = True if
self.node_id exists in
active_node_uuids<br>
else False<br>
</blockquote>
<br>
How was this case solved ?<br>
<br>
suppose we have three servers and 2
bricks in each server. A<br>
replicated volume is created using the
following command:<br>
<br>
gluster volume create test replica 2
server1:/brick1 server2:/brick1<br>
server2:/brick2 server3:/brick1
server3:/brick1 server1:/brick2<br>
<br>
In this case we have three replica-sets:<br>
<br>
* server1:/brick1 server2:/brick1<br>
* server2:/brick2 server3:/brick1<br>
* server3:/brick2 server2:/brick2<br>
<br>
Old AFR implementation for node-uuid
always returned the uuid of the<br>
node of the first brick, so in this case
we will get the uuid of the<br>
three nodes because all of them are the
first brick of a replica-set.<br>
<br>
Does this mean that with this
configuration all nodes are active ? Is<br>
this a problem ? Is there any other
check to avoid this situation if<br>
it's not good ?<br>
</blockquote>
Yes all Geo-rep workers will become Active
and participate in syncing.<br>
Since changelogs will have the same
information in replica bricks this<br>
will lead to duplicate syncing and
consuming network bandwidth.<br>
<br>
Node-uuid based Active worker is the
default configuration in Geo-rep<br>
till now, Geo-rep also has Meta Volume
based syncronization for Active<br>
worker using lock files.(Can be opted
using Geo-rep configuration,<br>
with this config node-uuid will not be
used)<br>
<br>
Kotresh proposed a solution to configure
which worker to become<br>
Active. This will give more control to
Admin to choose Active workers,<br>
This will become default configuration
from 3.12<br>
<a
href="https://github.com/gluster/glusterfs/issues/244"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/gluster/glu<wbr>sterfs/issues/244</a><br>
<br>
--<br>
Aravinda<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<br>
Xavi<br>
<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<br>
<br>
<br>
Bricks:<br>
<br>
<guid><br>
<br>
AFR/EC:<br>
<guid>(<guid>,
<guid>)<br>
<br>
DHT:<br>
<guid>(<guid>(<guid>,
...), <guid>(<guid>, ...))<br>
<br>
In this case, AFR and EC would
return the same <guid> they<br>
returned before the patch, but
between '(' and ')' they put the<br>
full list of guid's of all
nodes. The first <guid> can be
used<br>
by geo-replication. The list
after the first <guid> can be
used<br>
for rebalance.<br>
<br>
Not sure if there's any user
of node-uuid above DHT.<br>
<br>
Xavi<br>
<br>
<br>
<br>
<br>
Xavi<br>
<br>
<br>
On Tue, Jun 20,
2017 at 12:46 PM, Xavier Hernandez<br>
<<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><wbr>><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a><br>
<mailto:<a
href="mailto:xhernandez@datalab.es"
target="_blank"
moz-do-not-send="true">xhernandez@datalab.es</a>><wbr>>>><br>
wrote:<br>
<br>
Hi Pranith,<br>
<br>
On 20/06/17
07:53, Pranith Kumar Karampuri<br>
wrote:<br>
<br>
hi Xavi,<br>
We
all made the mistake of not<br>
sending about changing<br>
behavior
of<br>
node-uuid
xattr so that rebalance can use<br>
multiple nodes<br>
for doing<br>
rebalance.
Because of this on geo-rep all<br>
the workers<br>
are becoming<br>
active
instead of one per EC/AFR subvolume.<br>
So we are<br>
frantically trying<br>
to restore
the functionality of node-uuid<br>
and introduce<br>
a new<br>
xattr for<br>
the new
behavior. Sunil will be sending out<br>
a patch for<br>
this.<br>
<br>
<br>
Wouldn't it be
better to change geo-rep<br>
behavior<br>
to use the<br>
new data<br>
? I think it's
better as it's now, since it<br>
gives more<br>
information<br>
to upper
layers so that they can take more<br>
accurate decisions.<br>
<br>
Xavi<br>
<br>
<br>
--<br>
Pranith<br>
<br>
<br>
<br>
<br>
<br>
--<br>
Pranith<br>
<br>
<br>
<br>
<br>
<br>
--<br>
Pranith<br>
<br>
<br>
<br>
<br>
<br>
<br>
--<br>
Pranith<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">Pranith<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>