<div dir="auto">Yes but the biggest issue is how to recover<div dir="auto">You&#39;ll need to recover the whole storage not a single snapshot and this can last for days</div></div><div class="gmail_extra"><br><div class="gmail_quote">Il 23 mar 2017 9:24 PM, &quot;Alvin Starr&quot; &lt;<a href="mailto:alvin@netvel.net">alvin@netvel.net</a>&gt; ha scritto:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <p><tt>For volume backups you need something like snapshots.</tt></p>

    <p><tt>If you take a snapshot A of a live volume L that snapshot

        stays at that moment in time and you can rsync that to another

        system or use something like <a href="http://deltacp.pl" target="_blank">deltacp.pl</a> to copy it.</tt></p>

    <p><tt>The usual process is to delete the snapshot once its copied

        and than repeat the process again when the next backup is

        required.</tt></p>

    <p><tt>That process does require rsync/deltacp to read the complete

        volume on both systems which can take a long time.<br>

      </tt></p>

    <p><tt>I was kicking around the idea to try and handle snapshot

        deltas better.</tt></p>

    <p><tt>The idea is that you could take your initial snapshot A then

        sync that snapshot to your backup system.</tt></p>

    <p><tt>At a later point you could take another snapshot B.</tt></p>

    <p><tt>Because snapshots contain the copies of the original data at

        the time of the snapshot and unmodified data points to the Live

        volume it is possible to tell what blocks of data have changed

        since the snapshot was taken.</tt></p>

    <p><tt>Now that you have a second snapshot you can in essence perform

        a diff on the A and B snapshots to get only the blocks that

        changed up to the time that B was taken.</tt></p>

    <p><tt>These blocks could be copied to the backup image and you

        should have a clone of the B snapshot.</tt></p>

    <p><tt>You would not have to read the whole volume image but just

        the changed blocks dramatically improving the speed of the

        backup.<br>

      </tt></p>

    <p><tt>At this point you can delete the A snapshot and promote the B

        snapshot to be the A snapshot for the next backup round.<br>

      </tt></p>

    <br>

    <div class="m_8694824072006468141moz-cite-prefix">On 03/23/2017 03:53 PM, Gandalf

      Corvotempesta wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="auto">Are backup consistent?

        <div dir="auto">What happens if the header on shard0 is synced

          referring to some data on shard450 and when rsync parse

          shard450 this data is changed by subsequent writes?</div>

        <div dir="auto"><br>

        </div>

        <div dir="auto">Header would be backupped  of sync respect the

          rest of the image</div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">Il 23 mar 2017 8:48 PM, &quot;Joe Julian&quot;

          &lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;

          ha scritto:<br type="attribution">

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <p>The rsync protocol only passes blocks that have

                actually changed. Raw changes fewer bits. You&#39;re right,

                though, that it still has to check the entire file for

                those changes.<br>

              </p>

              <br>

              <div class="m_8694824072006468141m_2071367206087675765moz-cite-prefix">On

                03/23/17 12:47, Gandalf Corvotempesta wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="auto">Raw or qcow doesn&#39;t change anything

                  about the backup.

                  <div dir="auto">Georep always have to sync the whole

                    file</div>

                  <div dir="auto"><br>

                  </div>

                  <div dir="auto">Additionally, raw images has much less

                    features than qcow</div>

                </div>

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">Il 23 mar 2017 8:40 PM, &quot;Joe

                    Julian&quot; &lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;

                    ha scritto:<br type="attribution">

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000">

                        <p>I always use raw images. And yes, sharding

                          would also be good.<br>

                        </p>

                        <br>

                        <div class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798moz-cite-prefix">On

                          03/23/17 12:36, Gandalf Corvotempesta wrote:<br>

                        </div>

                        <blockquote type="cite">

                          <div dir="auto">Georep expose to another

                            problem:

                            <div dir="auto">When using gluster as

                              storage for VM, the VM file is saved as

                              qcow. Changes are inside the qcow, thus

                              rsync has to sync the whole file every

                              time</div>

                            <div dir="auto"><br>

                            </div>

                            <div dir="auto">A little workaround would be

                              sharding, as rsync has to sync only the

                              changed shards, but I don&#39;t think this is

                              a good solution</div>

                          </div>

                          <div class="gmail_extra"><br>

                            <div class="gmail_quote">Il 23 mar 2017 8:33

                              PM, &quot;Joe Julian&quot; &lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;

                              ha scritto:<br type="attribution">

                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                                <div bgcolor="#FFFFFF" text="#000000">

                                  <p>In many cases, a full backup set is

                                    just not feasible. Georep to the

                                    same or different DC may be an

                                    option if the bandwidth can keep up

                                    with the change set. If not, maybe

                                    breaking the data up into smaller

                                    more manageable volumes where you

                                    only keep a smaller set of critical

                                    data and just back that up. Perhaps

                                    an object store (swift?) might

                                    handle fault tolerance distribution

                                    better for some workloads.</p>

                                  <p>There&#39;s no one right answer.</p>

                                  <br>

                                  <div class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-cite-prefix">On

                                    03/23/17 12:23, Gandalf

                                    Corvotempesta wrote:<br>

                                  </div>

                                  <blockquote type="cite">

                                    <div dir="auto">Backing up from

                                      inside each VM doesn&#39;t solve the

                                      problem

                                      <div dir="auto">If you have to

                                        backup 500VMs you just need more

                                        than 1 day and what if you have

                                        to restore the whole gluster

                                        storage?</div>

                                      <div dir="auto"><br>

                                      </div>

                                      <div dir="auto">How many days do

                                        you need to restore 1PB?</div>

                                      <div dir="auto"><br>

                                      </div>

                                      <div dir="auto">Probably the only

                                        solution should be a georep in

                                        the same datacenter/rack with a

                                        similiar cluster, </div>

                                      <div dir="auto">ready to became

                                        the master storage.</div>

                                      <div dir="auto">In this case you

                                        don&#39;t need to restore anything

                                        as data are already there, </div>

                                      <div dir="auto">only a little bit

                                        back in time but this double the

                                        TCO</div>

                                    </div>

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote">Il 23 mar

                                        2017 6:39 PM, &quot;Serkan Çoban&quot;

                                        &lt;<a href="mailto:cobanserkan@gmail.com" target="_blank">cobanserkan@gmail.com</a>&gt;

                                        ha scritto:<br type="attribution">

                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Assuming

                                          a backup window of 12 hours,

                                          you need to send data at

                                          25GB/s<br>

                                          to backup solution.<br>

                                          Using 10G Ethernet on hosts

                                          you need at least 25 host to

                                          handle 25GB/s.<br>

                                          You can create an EC gluster

                                          cluster that can handle this

                                          rates, or<br>

                                          you just backup valuable data

                                          from inside VMs using open

                                          source backup<br>

                                          tools like borg,attic,restic ,

                                          etc...<br>

                                          <br>

                                          On Thu, Mar 23, 2017 at 7:48

                                          PM, Gandalf Corvotempesta<br>

                                          &lt;<a href="mailto:gandalf.corvotempesta@gmail.com" target="_blank">gandalf.corvotempesta@gmail.c<wbr>om</a>&gt;

                                          wrote:<br>

                                          &gt; Let&#39;s assume a 1PB

                                          storage full of VMs images

                                          with each brick over ZFS,<br>

                                          &gt; replica 3, sharding

                                          enabled<br>

                                          &gt;<br>

                                          &gt; How do you backup/restore

                                          that amount of data?<br>

                                          &gt;<br>

                                          &gt; Backing up daily is

                                          impossible, you&#39;ll never

                                          finish the backup that the<br>

                                          &gt; following one is starting

                                          (in other words, you need more

                                          than 24 hours)<br>

                                          &gt;<br>

                                          &gt; Restoring is even worse.

                                          You need more than 24 hours

                                          with the whole cluster<br>

                                          &gt; down<br>

                                          &gt;<br>

                                          &gt; You can&#39;t rely on ZFS

                                          snapshot due to sharding (the

                                          snapshot took from one<br>

                                          &gt; node is useless without

                                          all other node related at the

                                          same shard) and you<br>

                                          &gt; still have the same

                                          restore speed<br>

                                          &gt;<br>

                                          &gt; How do you backup this?<br>

                                          &gt;<br>

                                          &gt; Even georep isn&#39;t enough,

                                          if you have to restore the

                                          whole storage in case<br>

                                          &gt; of disaster<br>

                                          &gt;<br>

                                          &gt;

                                          ______________________________<wbr>_________________<br>

                                          &gt; Gluster-users mailing

                                          list<br>

                                          &gt; <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

                                          &gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>

                                        </blockquote>

                                      </div>

                                    </div>

                                    <br>

                                    <fieldset class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536mimeAttachmentHeader"></fieldset>

                                    <br>

                                    <pre>______________________________<wbr>_________________

Gluster-users mailing list

<a class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a></pre>

    </blockquote>

  </div>

______________________________<wbr>_________________

Gluster-users mailing list

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a>

</blockquote></div></div>

</blockquote>

</div></blockquote></div></div>

</blockquote>

</div></blockquote></div></div>

<fieldset class="m_8694824072006468141mimeAttachmentHeader"></fieldset>

<pre>______________________________<wbr>_________________

Gluster-users mailing list

<a class="m_8694824072006468141moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a class="m_8694824072006468141moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>

</blockquote>

<pre class="m_8694824072006468141moz-signature" cols="72">-- 

Alvin Starr                   ||   voice: <a href="tel:(905)%20513-7688" value="+19055137688" target="_blank">(905)513-7688</a>

Netvel Inc.                   ||   Cell:  <a href="tel:(416)%20806-0133" value="+14168060133" target="_blank">(416)806-0133</a>

<a class="m_8694824072006468141moz-txt-link-abbreviated" href="mailto:alvin@netvel.net" target="_blank">alvin@netvel.net</a>              ||

</pre></div><br>______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div></div>