[Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

Joseph Fernandes josferna at redhat.com
Fri Dec 5 01:29:51 UTC 2014


Typo corrections and marking Jeff in the mail.

Plus The Data Tiering team had a discussion yesterday late night and we have decided that the 15 sec delay wont kill us either.
But having a READ FOP recording for data is a MUST as stated in the earlier reply.

~Joe

----- Original Message -----
From: "Joseph Fernandes" <josferna at redhat.com>
To: "Venky Shankar" <yknev.shankar at gmail.com>
Cc: "Gluster Devel" <gluster-devel at gluster.org>
Sent: Thursday, December 4, 2014 10:42:56 PM
Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!!

On the performance on the data path I have seen a 3% dip in performance, with initial implementation which is not finalized.
The testing is in progress and not finalized yet as we are trying to reduce it as much as possible, with optimization in implementation and SQLite tunables  .
Will publish the final result as we are done with it. 

Venky,

Could you please let us know what is the performance impact on the IO path with changelog's
"15 seconds by default and has proved to provide a good balance between replication performance (geo-rep) and IOPS rate"
 configuration ?

Plus on the 15 sec delay the tiering team needs to discuss on the impact on the freshness of data.

As discussed to in-person and iterated MANY! times in many discussions with the changelog team,
1) When we dont have geo-rep ON i.e when changelog is not ON, we will populate the DB inline with the IO path
(which we are progressively working on reducing the IO path performance hit )
2) When Changelog is ON we will have the DB be feed by the libchangelog api. To remoce the freshness issue we
can have in-memory update on a LRU, as we are not looking for a sequential update. Plus we would need this in-memory
data structure as changelog DOESNOT provide read statistics! which is required for tiering and is a VERY crucial part
to detect the HOTNESS on the file!
3) As tiering is concerned we are not worried about the crash consistency as for 
   a. File which are COLD the data is safe on the disk
   b. File which are HOT the data even though the data in the memory is lost, since these file will get HOT again we will move them later
      If they dont get HOT then the crash is without impact
 
~ Joseph ( NOT Josef :) ) 

----- Original Message -----
From: "Venky Shankar" <yknev.shankar at gmail.com>
To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
Cc: "Gluster Devel" <gluster-devel at gluster.org>, dlambrig at redhat.com, josferna at redhat.com, "Vijay Bellur" <vbellur at redhat.com>
Sent: Thursday, December 4, 2014 8:53:43 PM
Subject: Re: [Gluster-devel] Discussion: Implications on geo-replication due to bitrot and tiering changes!!!

[Adding Dan/Josef/Vijay]

As of now, "rollover-time" is global to changelog translator, hence
tuning that would effect all consumers subscribing to updates. It's 15
seconds by default and has proved to provide a good balance between
replication performance (geo-rep) and IOPS rate. Tuning to a lower
value would imply doing a round of perf test for geo-rep to be safe.

The question is if data tiering can compromise on data freshness. If
yes, is there a hard limit? For BitRot, it should be OK as the policy
for checksum calculation is lazy. Adding a bit more lag would not hurt
much.

Josef,

Could you share the performance numbers along with the setup
(configuration, etc.) you used to measure SQLite performance inline to
the data path?

-Venky

On Thu, Dec 4, 2014 at 3:23 PM, Kotresh Hiremath Ravishankar
<khiremat at redhat.com> wrote:
> Hi,
>
> As of now, geo-replication is the only consumer of the changelog.
> Going forward bitrot and tiering also will join as consumers.
> The current format of the changelog can be found in below links.
>
> http://www.gluster.org/community/documentation/index.php/Arch/Change_Logging_Translator_Design
> https://github.com/gluster/glusterfs/blob/master/doc/features/geo-replication/libgfchangelog.md
>
>
> Current Design:
>
> 1. Every changelog.rollover-time secs (configurable), a new changelog file is generated:
>
> 2. Geo-replication history API, designed as part of Snapshot requirement, maintains
>    a HTIME file with changelog filenames generated. It is guaranteed that there is
>    no breakage between all the changelogs within one HTIME file i.e., changelog is not
>    enabled/disabled in between.
>
> Proposed changes for changelog as part of bitrot and tiering:
>
> 1. Add timestamp for each fop record in changelog.
>
>    Rational              : Tiering requires timestamp of each fop.
>    Implication on Geo-rep: NO
>
>
> 2. Make one big changelog per day or so and do not rollover the changelog every rollover-time.
>
>    Rational: Changing changelog.rollover-time is gonna affect all the three consumers hence
>              decoupling is required.
>
>                 Geo-replication: Is fine with changing rollover time.
>                 Tiering        : Not fine as per the input I got from Joseph (Joseph, please comment).
>                                  as this adds up to the delay that tiering gets the change
>                                  notification from changelog.
>                 Bitrot         : It should be fine. (Venky, please comment).
>
>    Implications on current Geo-replication Design:
>
>              1. Breaks History API: Needs redesign.
>              2. Changes to geo-replication changelog consumption logic ??
>              3. libgfchangelog API changes.
>              4. Effort to handle upgrade scenarios.
>
> Bitrot and Tiering guys, Please add any more changes expected which I have missed.
>
> Point to discuss, considering the implications on geo-replication, are there any other
> approaches with which we can solve this problem without much implication to current
> geo-replication logic??
>
>
> Thanks and Regards,
> Kotresh H R
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list