[Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!
Venky Shankar
yknev.shankar at gmail.com
Fri Dec 5 17:40:10 UTC 2014
On Thu, Dec 4, 2014 at 10:42 PM, Joseph Fernandes <josferna at redhat.com> wrote:
> On the performance on the data path I have seen a 3% dip in performance, with initial implementation which is not finalized.
> The testing is in progress and not finalized yet as we are trying to reduce it as much as possible, with optimization in implementation and SQLite tunables .
> Will publish the final result as we are done with it.
Sure.
>
> Venky,
>
> Could you please let us know what is the performance impact on the IO path with changelog's
Sure, numbers should be out soon.
> "15 seconds by default and has proved to provide a good balance between replication performance (geo-rep) and IOPS rate"
> configuration ?
>
> Plus on the 15 sec delay the tiering team needs to discuss on the impact on the freshness of data.
>
> As discussed to in-person and iterated MANY! times in many discussions with the changelog team,
I fail to understand why you bring up this point and detail the
approach "now". If this was discussed "many" times, it should have
been in this mailing list long back.
> 1) When we dont have geo-rep ON i.e when changelog is not ON, we will poluate the DB inline with the IO path
> (which we are progressively working on reducing the IO path performance hit )
> 2) When Changelog is ON we will have the DB be feed by the libchangelog api. To remoce the freshness issue we
> can have in-memory update on a LRU, as we are not looking for a sequential update. Plus we ould need this in-memory
> data structure as changelog DOESNOT provide read statistics! which is required for tiering and is a VERY crucial part
> to detect the HOTNESS on the file!
> 3) As tiering is concerned we are not worried about the crash consistency as for
> a. File which are COLD the data is safe on the disk
> b. File which are HOT the data even though the data in the memory is lost, since these file will get HOT again we will move them later
> If they don't get HOT then the crash is without impact
Probably something that might have been discussed but I cannot recall:
could the objects that got evicted from the LRU/LFU be fed to the DB
(or any data store)?
Wouldn't that guarantee data freshness in the datastore with the cache
providing the list of "hot" files? That way you have data store
freshness (what you'd get from feeding via I/O path) and the LRU/LFU
sits there as usual.
Thoughts?
Venky
>
> ~ Joseph ( NOT Josef :) )
>
> ----- Original Message -----
> From: "Venky Shankar" <yknev.shankar at gmail.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, dlambrig at redhat.com, josferna at redhat.com, "Vijay Bellur" <vbellur at redhat.com>
> Sent: Thursday, December 4, 2014 8:53:43 PM
> Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!
>
> [Adding Dan/Josef/Vijay]
>
> As of now, "rollover-time" is global to changelog translator, hence
> tuning that would effect all consumers subscribing to updates. It's 15
> seconds by default and has proved to provide a good balance between
> replication performance (geo-rep) and IOPS rate. Tuning to a lower
> value would imply doing a round of perf test for geo-rep to be safe.
>
> The question is if data tiering can compromise on data freshness. If
> yes, is there a hard limit? For BitRot, it should be OK as the policy
> for checksum calculation is lazy. Adding a bit more lag would not hurt
> much.
>
> Josef,
>
> Could you share the performance numbers along with the setup
> (configuration, etc.) you used to measure SQLite performance inline to
> the data path?
>
> -Venky
>
> On Thu, Dec 4, 2014 at 3:23 PM, Kotresh Hiremath Ravishankar
> <khiremat at redhat.com> wrote:
>> Hi,
>>
>> As of now, geo-replication is the only consumer of the changelog.
>> Going forward bitrot and tiering also will join as consumers.
>> The current format of the changelog can be found in below links.
>>
>> http://www.gluster.org/community/documentation/index.php/Arch/Change_Logging_Translator_Design
>> https://github.com/gluster/glusterfs/blob/master/doc/features/geo-replication/libgfchangelog.md
>>
>>
>> Current Design:
>>
>> 1. Every changelog.rollover-time secs (configurable), a new changelog file is generated:
>>
>> 2. Geo-replication history API, designed as part of Snapshot requirement, maintains
>> a HTIME file with changelog filenames generated. It is guaranteed that there is
>> no breakage between all the changelogs within one HTIME file i.e., changelog is not
>> enabled/disabled in between.
>>
>> Proposed changes for changelog as part of bitrot and tiering:
>>
>> 1. Add timestamp for each fop record in changelog.
>>
>> Rational : Tiering requires timestamp of each fop.
>> Implication on Geo-rep: NO
>>
>>
>> 2. Make one big changelog per day or so and do not rollover the changelog every rollover-time.
>>
>> Rational: Changing changelog.rollover-time is gonna affect all the three consumers hence
>> decoupling is required.
>>
>> Geo-replication: Is fine with changing rollover time.
>> Tiering : Not fine as per the input I got from Joseph (Joseph, please comment).
>> as this adds up to the delay that tiering gets the change
>> notification from changelog.
>> Bitrot : It should be fine. (Venky, please comment).
>>
>> Implications on current Geo-replication Design:
>>
>> 1. Breaks History API: Needs redesign.
>> 2. Changes to geo-replication changelog consumption logic ??
>> 3. libgfchangelog API changes.
>> 4. Effort to handle upgrade scenarios.
>>
>> Bitrot and Tiering guys, Please add any more changes expected which I have missed.
>>
>> Point to discuss, considering the implications on geo-replication, are there any other
>> approaches with which we can solve this problem without much implication to current
>> geo-replication logic??
>>
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list