[Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

Kotresh Hiremath Ravishankar khiremat at redhat.com
Thu Dec 11 10:14:18 UTC 2014


Hi All,

As per the discussions within Data Tiering, BitRot and Geo-Rep team, following things are discussed.

1. For Data Tiering to use changelog, in memory LRU/LFU implementation is required to capture reads
   as changelog journal doesn't capture reads. But given the commitments each team has on for themselves,
   it might not be possible to implement in memory LRU/LFU implementation by 3.7 time line.
   As per current testing done by Tiering team, feeding Database in I/O path is not hitting noticeable
   performance as crash consistency is not expected. Hence for 3.7, logic for feeding database will be
   in changelog translator or new crt translator. When LRU/LFU implementation is available down the line,
   database can be fed from LRU/LFU in only changelog translator.

2. Since BitRot or any other consumers might decide to use database, query and initialization APIs are 
   exposed as a library.

Please add if I have missed anything or any corrections.

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Venky Shankar" <yknev.shankar at gmail.com>
To: "Joseph Fernandes" <josferna at redhat.com>
Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Vijay Bellur" <vbellur at redhat.com>, "Dan Lambright" <dlambrig at redhat.com>, "Ben England" <bengland at redhat.com>, "Ric Wheeler" <rwheeler at redhat.com>, "Nagaprasad Sathyanarayana" <nsathyan at redhat.com>, "Vivek Agarwal" <vagarwal at redhat.com>
Sent: Saturday, December 6, 2014 1:53:16 PM
Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

[snip]
>
> Well If you would recall the multiple internal discussion we had and we had agreed upon on this long time from the beginning.(though not recorded)

Agreed. In that case changelog changes to feed an alternate data store
is unneeded, correct?

> and as a result of the discussion we have the Approach for the infra-structure https://gist.github.com/vshankar/346843ea529f3af35339
> AFAIK, Though the doc doesn't speak of the above in details it was always the plan to do it as above.

Absolutely, the document tries to solve things in a more generic way
and does not cover data store feeding from the cache. Thinking about
it more leads me to the point of feeding the data store at the time of
cache expiry a neat approach.

> The use of the LRU/LFU is definitely the way to go both with or without changelog recording as it boasts the performance for recording.
> And the mention of this is in https://gist.github.com/vshankar/346843ea529f3af35339 at the end. Well you know the best as you are the author :)
> (Kotresh and me contributed over discussions, though not recorded, thanks for mentioning it in the gluster-devel mail :) )

Correct me here: if data store is fed from cache (on expiry), is the
alternate feed from changelog (either inline or asynchronous to the
data path) needed?

>
> As I have mentioned the development of feeding the DB in the IO path is still in work in progress. We (Dan & Me) are making it more and more performant. We have
> also taking guidance from Ben England on testing it in parallel with development cycles so that we have the best approach &  implementation. That is where we are getting the numbers from (This is recorded in mails I will forward them to you). Plus we have kept Vijay Bellur in sync with the approach we are taking on a weekly basis ( though not recorded :) )

That's nice. But, my previous comment is still a concern.

>
> On the point of the discussion not recorded on gluster-devel, these discussion happened more frequently and in more adhoc way. Well you the best as you were part of all of them :).

Hmmm, not all.

>
> As we move forward we will have more discussion internally for sure and lets make sure that they are recorded so that lets not keep running
> around the same bush again and again ;).
>
> And Thanks for all the help in form of discussion/thoughts. Looking forward for more as we along.

Anytime.

>
> ~Joe
>
>
>     Venky
>
>>
>> ~ Joseph ( NOT Josef :) )
>>
>> ----- Original Message -----
>> From: "Venky Shankar" <yknev.shankar at gmail.com>
>> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
>> Cc: "Gluster Devel" <gluster-devel at gluster.org>, dlambrig at redhat.com, josferna at redhat.com, "Vijay Bellur" <vbellur at redhat.com>
>> Sent: Thursday, December 4, 2014 8:53:43 PM
>> Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!
>>
>> [Adding Dan/Josef/Vijay]
>>
>> As of now, "rollover-time" is global to changelog translator, hence
>> tuning that would effect all consumers subscribing to updates. It's 15
>> seconds by default and has proved to provide a good balance between
>> replication performance (geo-rep) and IOPS rate. Tuning to a lower
>> value would imply doing a round of perf test for geo-rep to be safe.
>>
>> The question is if data tiering can compromise on data freshness. If
>> yes, is there a hard limit? For BitRot, it should be OK as the policy
>> for checksum calculation is lazy. Adding a bit more lag would not hurt
>> much.
>>
>> Josef,
>>
>> Could you share the performance numbers along with the setup
>> (configuration, etc.) you used to measure SQLite performance inline to
>> the data path?
>>
>> -Venky
>>
>> On Thu, Dec 4, 2014 at 3:23 PM, Kotresh Hiremath Ravishankar
>> <khiremat at redhat.com> wrote:
>>> Hi,
>>>
>>> As of now, geo-replication is the only consumer of the changelog.
>>> Going forward bitrot and tiering also will join as consumers.
>>> The current format of the changelog can be found in below links.
>>>
>>> http://www.gluster.org/community/documentation/index.php/Arch/Change_Logging_Translator_Design
>>> https://github.com/gluster/glusterfs/blob/master/doc/features/geo-replication/libgfchangelog.md
>>>
>>>
>>> Current Design:
>>>
>>> 1. Every changelog.rollover-time secs (configurable), a new changelog file is generated:
>>>
>>> 2. Geo-replication history API, designed as part of Snapshot requirement, maintains
>>>    a HTIME file with changelog filenames generated. It is guaranteed that there is
>>>    no breakage between all the changelogs within one HTIME file i.e., changelog is not
>>>    enabled/disabled in between.
>>>
>>> Proposed changes for changelog as part of bitrot and tiering:
>>>
>>> 1. Add timestamp for each fop record in changelog.
>>>
>>>    Rational              : Tiering requires timestamp of each fop.
>>>    Implication on Geo-rep: NO
>>>
>>>
>>> 2. Make one big changelog per day or so and do not rollover the changelog every rollover-time.
>>>
>>>    Rational: Changing changelog.rollover-time is gonna affect all the three consumers hence
>>>              decoupling is required.
>>>
>>>                 Geo-replication: Is fine with changing rollover time.
>>>                 Tiering        : Not fine as per the input I got from Joseph (Joseph, please comment).
>>>                                  as this adds up to the delay that tiering gets the change
>>>                                  notification from changelog.
>>>                 Bitrot         : It should be fine. (Venky, please comment).
>>>
>>>    Implications on current Geo-replication Design:
>>>
>>>              1. Breaks History API: Needs redesign.
>>>              2. Changes to geo-replication changelog consumption logic ??
>>>              3. libgfchangelog API changes.
>>>              4. Effort to handle upgrade scenarios.
>>>
>>> Bitrot and Tiering guys, Please add any more changes expected which I have missed.
>>>
>>> Point to discuss, considering the implications on geo-replication, are there any other
>>> approaches with which we can solve this problem without much implication to current
>>> geo-replication logic??
>>>
>>>
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list