[Gluster-devel] geo-rep regression because of node-uuid change
Pranith Kumar Karampuri
pkarampu at redhat.com
Fri Jul 7 14:41:56 UTC 2017
On Fri, Jul 7, 2017 at 3:05 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:
> On 07/07/17 11:25, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Fri, Jul 7, 2017 at 2:46 PM, Xavier Hernandez <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>> On 07/07/17 10:12, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Fri, Jul 7, 2017 at 1:13 PM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>>
>> wrote:
>>
>> Hi Pranith,
>>
>> On 05/07/17 12:28, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Tue, Jul 4, 2017 at 2:26 PM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>>
>> wrote:
>>
>> Hi Pranith,
>>
>> On 03/07/17 08:33, Pranith Kumar Karampuri wrote:
>>
>> Xavi,
>> Now that the change has been reverted, we
>> can
>> resume this
>> discussion and decide on the exact format that
>> considers, tier, dht,
>> afr, ec. People working geo-rep/dht/afr/ec had
>> an internal
>> discussion
>> and we all agreed that this proposal would be a
>> good way
>> forward. I
>> think once we agree on the format and decide on
>> the initial
>> encoding/decoding functions of the xattr and
>> this change is
>> merged, we
>> can send patches on afr/ec/dht and geo-rep to
>> take it to
>> closure.
>>
>> Could you propose the new format you have in
>> mind that
>> considers
>> all of
>> the xlators?
>>
>>
>> My idea was to create a new xattr not bound to any
>> particular
>> function but which could give enough information to
>> be used
>> in many
>> places.
>>
>> Currently we have another attribute called
>> glusterfs.pathinfo that
>> returns hierarchical information about the location
>> of a
>> file. Maybe
>> we can extend this to unify all these attributes
>> into a single
>> feature that could be used for multiple purposes.
>>
>> Since we have time to discuss it, I would like to
>> design it with
>> more information than we already talked.
>>
>> First of all, the amount of information that this
>> attribute can
>> contain is quite big if we expect to have volumes with
>> thousands of
>> bricks. Even in the most simple case of returning
>> only an
>> UUID, we
>> can easily go beyond the limit of 64KB.
>>
>> Consider also, for example, what shard should return
>> when
>> pathinfo
>> is requested for a file. Probably it should return a
>> list of
>> shards,
>> each one with all its associated pathinfo. We are
>> talking
>> about big
>> amounts of data here.
>>
>> I think this kind of information doesn't fit very
>> well in an
>> extended attribute. Another think to consider is
>> that most
>> probably
>> the requester of the data only needs a fragment of
>> it, so we are
>> generating big amounts of data only to be parsed and
>> reduced
>> later,
>> dismissing most of it.
>>
>> What do you think about using a very special virtual
>> file to
>> manage
>> all this information ? it could be easily read using
>> normal read
>> fops, so it could manage big amounts of data easily.
>> Also,
>> accessing
>> only to some parts of the file we could go directly
>> where we
>> want,
>> avoiding the read of all remaining data.
>>
>> A very basic idea could be this:
>>
>> Each xlator would have a reserved area of the file.
>> We can
>> reserve
>> up to 4GB per xlator (32 bits). The remaining 32
>> bits of the
>> offset
>> would indicate the xlator we want to access.
>>
>> At offset 0 we have generic information about the
>> volume.
>> One of the
>> the things that this information should include is a
>> basic
>> hierarchy
>> of the whole volume and the offset for each xlator.
>>
>> After reading this, the user will seek to the
>> desired offset and
>> read the information related to the xlator it is
>> interested in.
>>
>> All the information should be stored in a format
>> easily
>> extensible
>> that will be kept compatible even if new information
>> is
>> added in the
>> future (for example doing special mappings of the 32
>> bits
>> offsets
>> reserved for the xlator).
>>
>> For example we can reserve the first megabyte of the
>> xlator
>> area to
>> have a mapping of attributes with its respective
>> offset.
>>
>> I think that using a binary format would simplify
>> all this a
>> lot.
>>
>> Do you think this is a way to explore or should I stop
>> wasting time
>> here ?
>>
>>
>> I think this just became a very big feature :-). Shall
>> we just
>> live with
>> it the way it is now?
>>
>>
>> I supposed it...
>>
>> Only thing we need to check is if shard needs to handle this
>> xattr.
>> If so, what it should return ? only the UUID's corresponding
>> to the
>> first shard or the UUID's of all bricks containing at least
>> one
>> shard ? I guess that the first one is enough, but just to be
>> sure...
>>
>> My proposal was to implement a new xattr, for example
>> glusterfs.layout, that contains enough information to be
>> usable in
>> all current use cases.
>>
>>
>> Actually pathinfo is supposed to give this information and it
>> already
>> has the following format: for a 5x2 distributed-replicate volume
>>
>>
>> Yes, I know. I wanted to unify all information.
>>
>>
>> root at dhcp35-190 - /mnt/v3
>> 13:38:12 :) ⚡ getfattr -n trusted.glusterfs.pathinfo d
>> # file: d
>> trusted.glusterfs.pathinfo="((<DISTRIBUTE:v3-dht>
>> (<REPLICATE:v3-replicate-0>
>> <POSIX(/home/gfs/v3_0):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_0/d>
>> <POSIX(/home/gfs/v3_1):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_1/d>)
>> (<REPLICATE:v3-replicate-2>
>> <POSIX(/home/gfs/v3_5):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_5/d>
>> <POSIX(/home/gfs/v3_4):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_4/d>)
>> (<REPLICATE:v3-replicate-1>
>> <POSIX(/home/gfs/v3_3):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_3/d>
>> <POSIX(/home/gfs/v3_2):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_2/d>)
>> (<REPLICATE:v3-replicate-4>
>> <POSIX(/home/gfs/v3_8):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_8/d>
>> <POSIX(/home/gfs/v3_9):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_9/d>)
>> (<REPLICATE:v3-replicate-3>
>> <POSIX(/home/gfs/v3_6):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_6/d>
>> <POSIX(/home/gfs/v3_7):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_7/d>))
>> (v3-dht-layout (v3-replicate-0 0 858993458) (v3-replicate-1
>> 858993459
>> 1717986917) (v3-replicate-2 1717986918 2576980376) (v3-replicate-3
>> 2576980377 3435973835) (v3-replicate-4 3435973836 4294967295)))"
>>
>>
>> root at dhcp35-190 - /mnt/v3
>> 13:38:26 :) ⚡ getfattr -n trusted.glusterfs.pathinfo d/a
>> # file: d/a
>> trusted.glusterfs.pathinfo="(<DISTRIBUTE:v3-dht>
>> (<REPLICATE:v3-replicate-1>
>> <POSIX(/home/gfs/v3_3):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_3/d/a>
>> <POSIX(/home/gfs/v3_2):dhcp35-190.lab.eng.blr.redhat.com:/ho
>> me/gfs/v3_2/d/a>))"
>>
>>
>>
>>
>> The idea would be that each xlator that makes a significant
>> change
>> in the way or the place where files are stored, should put
>> information in this xattr. The information should include:
>>
>> * Type (basically AFR, EC, DHT, ...)
>> * Basic configuration (replication and arbiter for AFR, data
>> and
>> redundancy for EC, # subvolumes for DHT, shard size for
>> sharding, ...)
>> * Quorum imposed by the xlator
>> * UUID data comming from subvolumes (sorted by brick position)
>> * It should be easily extensible in the future
>>
>> The last point is very important to avoid the issues we have
>> seen
>> now. We must be able to incorporate more information without
>> breaking backward compatibility. To do so, we can add tags
>> for each
>> value.
>>
>> For example, a distribute 2, replica 2 volume with 1 arbiter
>> should
>> be represented by this string:
>>
>> DHT[dist=2,quorum=1](
>> AFR[rep=2,arbiter=1,quorum=2](
>> NODE[quorum=2,uuid=<UUID1>](<path1>),
>> NODE[quorum=2,uuid=<UUID2>](<path2>),
>> NODE[quorum=2,uuid=<UUID3>](<path3>)
>> ),
>> AFR[rep=2,arbiter=1,quorum=2](
>> NODE[quorum=2,uuid=<UUID4>](<path4>),
>> NODE[quorum=2,uuid=<UUID5>](<path5>),
>> NODE[quorum=2,uuid=<UUID6>](<path6>)
>> )
>> )
>>
>
Yes, this looks simpler for now.
>
>> Some explanations:
>>
>> AFAIK DHT doesn't have quorum, so the default is '1'. We may
>> decide
>> to omit it when it's '1' for any xlator.
>>
>> Quorum in AFR represents client-side enforced quorum. Quorum
>> in NODE
>> represents the server-side enforced quorum.
>>
>> The <path> shown in each NODE represents the physical
>> location of
>> the file (similar to current glusterfs.pathinfo) because
>> this xattr
>> can be retrieved for a particular file using getxattr. This
>> is nice,
>> but we can remove it for now if it's difficult to implement.
>>
>> We can decide to have a verbose string or try to omit some
>> fields
>> when not strictly necessary. For example, if there are no
>> arbiters,
>> we can omit the 'arbiter' tag instead of writing 'arbiter=0'.
>> We
>> could also implicitly compute 'dist' and 'rep' from the
>> number of
>> elements contained between '()'.
>>
>> What do you think ?
>>
>>
>> Quite a few people are already familiar with path-info. So I am
>> of the
>> opinion that we give this information for that xattr itself.
>> This xattr
>> hasn't changed after quorum/arbiter/shard came in, so may be
>> they should?
>>
>>
>> Not sure how easy would it be to change the format of path-info to
>> incorporate the new information without breaking existing features
>> or even user scripts based on it. Maybe a new xattr would be easier
>> to implement and adapt.
>>
>>
>> Probably.
>>
>>
>>
>> I missed one important thing in the format: an xlator may have
>> per-subvolume information. This information can be placed just
>> before each subvolume information:
>>
>> DHT[dist=2,quorum=1](
>> [hash-range=0x00000000-0x7fffffff]AFR[...](...),
>> [hash-range=0x80000000-0xffffffff]AFR[...](...)
>> )
>>
>>
>> Yes, makes sense.
>>
>> In general I am better at solving problems someone faces, because things
>> will be more concrete. Do you think it is better to wait until the first
>> consumer of this functionality comes along and gives their inputs about
>> what would be nice to have vs must have? At the moment I am not sure how
>> to distinguish what must be there vs what is nice to have :-(.
>>
>
> The good thing is that using this format we can easily start with bare
> minimum information, like this:
>
> DHT(
> AFR(
> NODE[uuid=<UUID1>],
> NODE[uuid=<UUID2>],
> NODE[uuid=<UUID3>]
> ),
> AFR(
> NODE[uuid=<UUID1>],
> NODE[uuid=<UUID2>],
> NODE[uuid=<UUID3>]
> )
> )
>
> And add more information as it is needed, since it won't break backward
> compatibility.
>
> Xavi
>
>
>>
>> Xavi
>>
>>
>>
>>
>> Xavi
>>
>>
>>
>>
>> Xavi
>>
>>
>>
>>
>> On Wed, Jun 21, 2017 at 2:08 PM, Karthik
>> Subrahmanya
>> <ksubrahm at redhat.com
>> <mailto:ksubrahm at redhat.com> <mailto:ksubrahm at redhat.com
>> <mailto:ksubrahm at redhat.com>>
>> <mailto:ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>
>> <mailto:ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>>>
>> <mailto:ksubrahm at redhat.com
>> <mailto:ksubrahm at redhat.com> <mailto:ksubrahm at redhat.com
>> <mailto:ksubrahm at redhat.com>>
>> <mailto:ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>
>> <mailto:ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>>>>>
>> wrote:
>>
>>
>>
>> On Wed, Jun 21, 2017 at 1:56 PM, Xavier
>> Hernandez
>> <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>>>
>> wrote:
>>
>> That's ok. I'm currently unable to write
>> a patch for
>> this on ec.
>>
>> Sunil is working on this patch.
>>
>> ~Karthik
>>
>> If no one can do it, I can try to do it
>> in 6 - 7
>> hours...
>>
>> Xavi
>>
>>
>> On Wednesday, June 21, 2017 09:48 CEST,
>> Pranith
>> Kumar
>> Karampuri
>> <pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com
>> >>>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>>>
>> wrote:
>>
>>
>>
>> On Wed, Jun 21, 2017 at 1:00 PM,
>> Xavier
>> Hernandez
>> <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>>> wrote:
>>
>> I'm ok with reverting node-uuid
>> content
>> to the
>> previous
>> format and create a new xattr
>> for the
>> new format.
>> Currently, only rebalance will
>> use it.
>>
>> Only thing to consider is what can
>> happen if we
>> have a
>> half upgraded cluster where some
>> clients
>> have
>> this change
>> and some not. Can rebalance work
>> in this
>> situation ? if
>> so, could there be any issue ?
>>
>>
>> I think there shouldn't be any
>> problem,
>> because this is
>> in-memory xattr so layers below
>> afr/ec will
>> only see
>> node-uuid
>> xattr.
>> This also gives us a chance to do
>> whatever
>> we want
>> to do in
>> future with this xattr without any
>> problems
>> about
>> backward
>> compatibility.
>>
>> You can check
>>
>>
>>
>> https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/s
>> rc/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>>
>>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>>>
>>
>>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>>
>>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>
>>
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507
>> <https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
>> src/afr-inode-read.c at 1507>>>>
>> for how karthik implemented this in
>> AFR
>> (this got merged
>> accidentally yesterday, but looks
>> like this
>> is what
>> we are
>> settling on)
>>
>>
>>
>> Xavi
>>
>>
>> On Wednesday, June 21, 2017
>> 06:56 CEST,
>> Pranith
>> Kumar
>> Karampuri <pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com
>> >>>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>
>> <mailto:pkarampu at redhat.com
>> <mailto:pkarampu at redhat.com>>>>> wrote:
>>
>>
>>
>> On Wed, Jun 21, 2017 at
>> 10:07 AM, Nithya
>> Balachandran
>> <nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>
>> <mailto:nbalacha at redhat.com <mailto:nbalacha at redhat.com>>
>> <mailto:nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>
>> <mailto:nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>>> <mailto:nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>
>> <mailto:nbalacha at redhat.com <mailto:nbalacha at redhat.com>>
>> <mailto:nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>
>> <mailto:nbalacha at redhat.com
>> <mailto:nbalacha at redhat.com>>>>> wrote:
>>
>>
>> On 20 June 2017 at
>> 20:38, Aravinda
>> <avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>>>>> wrote:
>>
>> On 06/20/2017 06:02
>> PM, Pranith
>> Kumar Karampuri
>> wrote:
>>
>> Xavi, Aravinda
>> and I had a
>> discussion on
>> #gluster-dev and
>> we
>> agreed to go
>> with the format
>> Aravinda
>> suggested for
>> now and
>> in future we
>> wanted some more
>> changes
>> for dht
>> to detect which
>> subvolume went
>> down came
>> back
>> up, at that time
>> we will revisit
>> the solution
>> suggested by Xavi.
>>
>> Susanth is doing
>> the dht
>> changes
>> Aravinda is doing
>> geo-rep changes
>>
>> Done. Geo-rep patch
>> sent for
>> review
>>
>> https://review.gluster.org/17582
>> <https://review.gluster.org/17582>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>>>
>>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>
>> <https://review.gluster.org/17582
>> <https://review.gluster.org/17582>>>>
>>
>>
>>
>> The proposed changes to
>> the
>> node-uuid
>> behaviour
>> (while good) are going
>> to break
>> tiering
>> . Tiering
>> changes will take a
>> little more
>> time to
>> be coded and
>> tested.
>>
>> As this is a regression
>> for 3.11
>> and a
>> blocker for
>> 3.11.1, I suggest we go
>> back to
>> the original
>> node-uuid behaviour for
>> now so as to
>> unblock the
>> release and target the
>> proposed
>> changes
>> for the next
>> 3.11 releases.
>>
>>
>> Let me see if I understand
>> the changes
>> correctly. We are
>> restoring the behavior of
>> node-uuid
>> xattr
>> and adding a
>> new xattr for parallel
>> rebalance for
>> both
>> afr and ec,
>> correct? Otherwise that is
>> one more
>> regression. If yes,
>> we will also wait for Xavi's
>> inputs.
>> Jeff
>> accidentally
>> merged the afr patch
>> yesterday which
>> does
>> these changes.
>> If everyone is in agreement,
>> we will
>> leave
>> it as is and
>> add similar changes in ec as
>> well.
>> If we are
>> not in
>> agreement, then we will let
>> the
>> discussion
>> progress :-)
>>
>>
>>
>>
>> Regards,
>> Nithya
>>
>> --
>> Aravinda
>>
>>
>> Thanks to all of
>> you
>> guys for
>> the discussions!
>>
>> On Tue, Jun 20,
>> 2017 at
>> 5:05 PM,
>> Xavier
>> Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>
>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>>>>> wrote:
>>
>> Hi Aravinda,
>>
>> On 20/06/17
>> 12:42,
>> Aravinda
>> wrote:
>>
>> I think
>> following format
>> can be easily
>> adopted
>> by all
>> components
>>
>> UUIDs of a
>> subvolume are
>> seperated by
>> space and
>> subvolumes are
>> separated
>> by comma
>>
>> For
>> example,
>> node1 and
>> node2 are replica
>> with U1
>> and U2 UUIDs
>>
>> respectively and
>> node3
>> and node4 are
>> replica with U3 and
>> U4 UUIDs
>> respectively
>>
>>
>> node-uuid can
>> return "U1
>> U2,U3 U4"
>>
>>
>> While this
>> is ok for
>> current
>> implementation,
>> I think this
>> can be
>> insufficient if there
>> are more
>> layers of
>> xlators
>> that require to
>> indicate
>> some sort of
>> grouping. Some
>>
>> representation that can
>> represent hierarchy
>> would be
>> better. For
>> example: "(U1 U2) (U3
>> U4)" (we can
>> use
>> spaces or
>> comma as a
>> separator).
>>
>>
>>
>> Geo-rep
>> can
>> split by ","
>> and then split
>> by space
>> and
>> take first UUID
>> DHT can
>> split
>> the value
>> by space or
>> comma
>> and get unique
>> UUIDs list
>>
>>
>> This doesn't
>> solve the
>> problem I described
>> in the
>> previous
>> email. Some
>> more logic will
>> need to be
>> added to
>> avoid
>> more than one node
>> from each
>> replica-set to be
>> active. If we
>> have some
>> explicit
>> hierarchy
>> information in
>> the
>> node-uuid value,
>> more
>> decisions can be
>> taken.
>>
>> An initial
>> proposal
>> I made
>> was this:
>>
>>
>> DHT[2](AFR[2,0](NODE(U1),
>> NODE(U2)),
>>
>> AFR[2,0](NODE(U1),
>> NODE(U2)))
>>
>> This is
>> harder to
>> parse, but
>> gives a lot of
>> information:
>> DHT with 2
>> subvolumes, each
>> subvolume is
>> an AFR with
>> replica 2 and no
>> arbiters.
>> It's also
>> easily
>> extensible with
>> any new
>> xlator that
>> changes
>> the layout.
>>
>> However
>> maybe this
>> is not
>> the moment to do
>> this, and
>> probably
>> we could
>> implement this
>> in a new
>> xattr with
>> a better
>> name.
>>
>> Xavi
>>
>>
>>
>> Another
>> question is
>> about the behavior
>> when a
>> node is down,
>> existing
>>
>> node-uuid xattr
>> will not
>> return that
>> UUID if
>> a node
>> is down.
>> What is the
>> behavior
>> with the
>> proposed xattr?
>>
>> Let me
>> know your
>> thoughts.
>>
>> regards
>> Aravinda
>> VK
>>
>> On
>> 06/20/2017
>> 03:06 PM,
>> Aravinda wrote:
>>
>> Hi
>> Xavi,
>>
>> On
>> 06/20/2017 02:51
>> PM, Xavier
>>
>> Hernandez wrote:
>>
>>
>> Hi Aravinda,
>>
>>
>> On 20/06/17
>> 11:05, Pranith Kumar
>>
>> Karampuri wrote:
>>
>>
>> Adding more
>> people to get a
>>
>> consensus
>> about this.
>>
>>
>> On
>> Tue, Jun
>> 20, 2017 at 1:49
>>
>> PM,
>> Aravinda
>>
>> <avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com
>> >>>
>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com
>> >>>>
>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com
>> >>>
>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com <mailto:avishwan at redhat.com>>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>
>> <mailto:avishwan at redhat.com
>> <mailto:avishwan at redhat.com>>>>>>
>>
>> wrote:
>>
>>
>>
>> regards
>>
>> Aravinda VK
>>
>>
>>
>> On
>> 06/20/2017 01:26 PM,
>
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170707/ac752476/attachment-0001.html>
More information about the Gluster-devel
mailing list