[Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

Sat Dec 6 05:13:42 UTC 2014

Answers INLINE JOE>>

----- Original Message -----
From: "Venky Shankar" <yknev.shankar at gmail.com>
To: "Joseph Fernandes" <josferna at redhat.com>
Cc: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>, dlambrig at redhat.com, "Vijay Bellur" <vbellur at redhat.com>
Sent: Friday, December 5, 2014 11:10:10 PM
Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

On Thu, Dec 4, 2014 at 10:42 PM, Joseph Fernandes <josferna at redhat.com> wrote:
> On the performance on the data path I have seen a 3% dip in performance, with initial implementation which is not finalized.
> The testing is in progress and not finalized yet as we are trying to reduce it as much as possible, with optimization in implementation and SQLite tunables  .
> Will publish the final result as we are done with it.

Sure.

>
> Venky,
>
> Could you please let us know what is the performance impact on the IO path with changelog's

Sure, numbers should be out soon.

JOE>> 
Thanks.. This will give us a fair idea about the delays in the IO that would be introduced if changelog (not the xlator 
but the recording done by xlator) is made a dependency for Data Tiering. 

> "15 seconds by default and has proved to provide a good balance between replication performance (geo-rep) and IOPS rate"
>  configuration ?
>
> Plus on the 15 sec delay the tiering team needs to discuss on the impact on the freshness of data.
>
> As discussed to in-person and iterated MANY! times in many discussions with the changelog team,

I fail to understand why you bring up this point and detail the
approach "now". If this was discussed "many" times, it should have
been in this mailing list long back.

> 1) When we dont have geo-rep ON i.e when changelog is not ON, we will poluate the DB inline with the IO path
> (which we are progressively working on reducing the IO path performance hit )
> 2) When Changelog is ON we will have the DB be feed by the libchangelog api. To remoce the freshness issue we
> can have in-memory update on a LRU, as we are not looking for a sequential update. Plus we ould need this in-memory
> data structure as changelog DOESNOT provide read statistics! which is required for tiering and is a VERY crucial part
> to detect the HOTNESS on the file!
> 3) As tiering is concerned we are not worried about the crash consistency as for
>    a. File which are COLD the data is safe on the disk
>    b. File which are HOT the data even though the data in the memory is lost, since these file will get HOT again we will move them later
>       If they don't get HOT then the crash is without impact

Probably something that might have been discussed but I cannot recall:
could the objects that got evicted from the LRU/LFU be fed to the DB
(or any data store)?
Wouldn't that guarantee data freshness in the datastore with the cache
providing the list of "hot" files? That way you have data store
freshness (what you'd get from feeding via I/O path) and the LRU/LFU
sits there as usual.

Thoughts?

JOE>> 

Well If you would recall the multiple internal discussion we had and we had agreed upon on this long time from the beginning.(though not recorded)
and as a result of the discussion we have the Approach for the infra-structure https://gist.github.com/vshankar/346843ea529f3af35339 
AFAIK, Though the doc doesn't speak of the above in details it was always the plan to do it as above.
The use of the LRU/LFU is definitely the way to go both with or without changelog recording as it boasts the performance for recording.
And the mention of this is in https://gist.github.com/vshankar/346843ea529f3af35339 at the end. Well you know the best as you are the author :)
(Kotresh and me contributed over discussions, though not recorded, thanks for mentioning it in the gluster-devel mail :) )

As I have mentioned the development of feeding the DB in the IO path is still in work in progress. We (Dan & Me) are making it more and more performant. We have 
also taking guidance from Ben England on testing it in parallel with development cycles so that we have the best approach &  implementation. That is where we are getting the numbers from (This is recorded in mails I will forward them to you). Plus we have kept Vijay Bellur in sync with the approach we are taking on a weekly basis ( though not recorded :) )

On the point of the discussion not recorded on gluster-devel, these discussion happened more frequently and in more adhoc way. Well you the best as you were part of all of them :). 

As we move forward we will have more discussion internally for sure and lets make sure that they are recorded so that lets not keep running 
around the same bush again and again ;).

And Thanks for all the help in form of discussion/thoughts. Looking forward for more as we along.

~Joe

    Venky

>
> ~ Joseph ( NOT Josef :) )
>
> ----- Original Message -----
> From: "Venky Shankar" <yknev.shankar at gmail.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, dlambrig at redhat.com, josferna at redhat.com, "Vijay Bellur" <vbellur at redhat.com>
> Sent: Thursday, December 4, 2014 8:53:43 PM
> Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!
>
> [Adding Dan/Josef/Vijay]
>
> As of now, "rollover-time" is global to changelog translator, hence
> tuning that would effect all consumers subscribing to updates. It's 15
> seconds by default and has proved to provide a good balance between
> replication performance (geo-rep) and IOPS rate. Tuning to a lower
> value would imply doing a round of perf test for geo-rep to be safe.
>
> The question is if data tiering can compromise on data freshness. If
> yes, is there a hard limit? For BitRot, it should be OK as the policy
> for checksum calculation is lazy. Adding a bit more lag would not hurt
> much.
>
> Josef,
>
> Could you share the performance numbers along with the setup
> (configuration, etc.) you used to measure SQLite performance inline to
> the data path?
>
> -Venky
>
> On Thu, Dec 4, 2014 at 3:23 PM, Kotresh Hiremath Ravishankar
> <khiremat at redhat.com> wrote:
>> Hi,
>>
>> As of now, geo-replication is the only consumer of the changelog.
>> Going forward bitrot and tiering also will join as consumers.
>> The current format of the changelog can be found in below links.
>>
>> http://www.gluster.org/community/documentation/index.php/Arch/Change_Logging_Translator_Design
>> https://github.com/gluster/glusterfs/blob/master/doc/features/geo-replication/libgfchangelog.md
>>
>>
>> Current Design:
>>
>> 1. Every changelog.rollover-time secs (configurable), a new changelog file is generated:
>>
>> 2. Geo-replication history API, designed as part of Snapshot requirement, maintains
>>    a HTIME file with changelog filenames generated. It is guaranteed that there is
>>    no breakage between all the changelogs within one HTIME file i.e., changelog is not
>>    enabled/disabled in between.
>>
>> Proposed changes for changelog as part of bitrot and tiering:
>>
>> 1. Add timestamp for each fop record in changelog.
>>
>>    Rational              : Tiering requires timestamp of each fop.
>>    Implication on Geo-rep: NO
>>
>>
>> 2. Make one big changelog per day or so and do not rollover the changelog every rollover-time.
>>
>>    Rational: Changing changelog.rollover-time is gonna affect all the three consumers hence
>>              decoupling is required.
>>
>>                 Geo-replication: Is fine with changing rollover time.
>>                 Tiering        : Not fine as per the input I got from Joseph (Joseph, please comment).
>>                                  as this adds up to the delay that tiering gets the change
>>                                  notification from changelog.
>>                 Bitrot         : It should be fine. (Venky, please comment).
>>
>>    Implications on current Geo-replication Design:
>>
>>              1. Breaks History API: Needs redesign.
>>              2. Changes to geo-replication changelog consumption logic ??
>>              3. libgfchangelog API changes.
>>              4. Effort to handle upgrade scenarios.
>>
>> Bitrot and Tiering guys, Please add any more changes expected which I have missed.
>>
>> Point to discuss, considering the implications on geo-replication, are there any other
>> approaches with which we can solve this problem without much implication to current
>> geo-replication logic??
>>
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel