<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><tt>That's true and it can last much longer than days.</tt></p>
<p><tt>I have a client that has some data-sets that take months to copy
and are not the biggest data user in the world.</tt></p>
<p><tt><br>
The biggest problems with backups is that some day you may need
to restore them.</tt></p>
<p><tt></tt><br>
</p>
<br>
<div class="moz-cite-prefix">On 03/23/2017 04:29 PM, Gandalf
Corvotempesta wrote:<br>
</div>
<blockquote
cite="mid:CAJH6TXgeFM_s_LfupqstAC0oh9kHkWckyUtjhmnsN3ekmdnGEA@mail.gmail.com"
type="cite">
<div dir="auto">Yes but the biggest issue is how to recover
<div dir="auto">You'll need to recover the whole storage not a
single snapshot and this can last for days</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Il 23 mar 2017 9:24 PM, "Alvin Starr"
<<a moz-do-not-send="true" href="mailto:alvin@netvel.net">alvin@netvel.net</a>>
ha scritto:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p><tt>For volume backups you need something like
snapshots.</tt></p>
<p><tt>If you take a snapshot A of a live volume L that
snapshot stays at that moment in time and you can
rsync that to another system or use something like <a
moz-do-not-send="true" href="http://deltacp.pl"
target="_blank">deltacp.pl</a> to copy it.</tt></p>
<p><tt>The usual process is to delete the snapshot once
its copied and than repeat the process again when the
next backup is required.</tt></p>
<p><tt>That process does require rsync/deltacp to read the
complete volume on both systems which can take a long
time.<br>
</tt></p>
<p><tt>I was kicking around the idea to try and handle
snapshot deltas better.</tt></p>
<p><tt>The idea is that you could take your initial
snapshot A then sync that snapshot to your backup
system.</tt></p>
<p><tt>At a later point you could take another snapshot B.</tt></p>
<p><tt>Because snapshots contain the copies of the
original data at the time of the snapshot and
unmodified data points to the Live volume it is
possible to tell what blocks of data have changed
since the snapshot was taken.</tt></p>
<p><tt>Now that you have a second snapshot you can in
essence perform a diff on the A and B snapshots to get
only the blocks that changed up to the time that B was
taken.</tt></p>
<p><tt>These blocks could be copied to the backup image
and you should have a clone of the B snapshot.</tt></p>
<p><tt>You would not have to read the whole volume image
but just the changed blocks dramatically improving the
speed of the backup.<br>
</tt></p>
<p><tt>At this point you can delete the A snapshot and
promote the B snapshot to be the A snapshot for the
next backup round.<br>
</tt></p>
<br>
<div class="m_8694824072006468141moz-cite-prefix">On
03/23/2017 03:53 PM, Gandalf Corvotempesta wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Are backup consistent?
<div dir="auto">What happens if the header on shard0
is synced referring to some data on shard450 and
when rsync parse shard450 this data is changed by
subsequent writes?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Header would be backupped of sync
respect the rest of the image</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Il 23 mar 2017 8:48 PM, "Joe
Julian" <<a moz-do-not-send="true"
href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>>
ha scritto:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>The rsync protocol only passes blocks that
have actually changed. Raw changes fewer bits.
You're right, though, that it still has to
check the entire file for those changes.<br>
</p>
<br>
<div
class="m_8694824072006468141m_2071367206087675765moz-cite-prefix">On
03/23/17 12:47, Gandalf Corvotempesta wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Raw or qcow doesn't change
anything about the backup.
<div dir="auto">Georep always have to sync
the whole file</div>
<div dir="auto"><br>
</div>
<div dir="auto">Additionally, raw images has
much less features than qcow</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Il 23 mar 2017 8:40
PM, "Joe Julian" <<a
moz-do-not-send="true"
href="mailto:joe@julianfamily.org"
target="_blank">joe@julianfamily.org</a>>
ha scritto:<br type="attribution">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>I always use raw images. And yes,
sharding would also be good.<br>
</p>
<br>
<div
class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798moz-cite-prefix">On
03/23/17 12:36, Gandalf
Corvotempesta wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Georep expose to
another problem:
<div dir="auto">When using gluster
as storage for VM, the VM file
is saved as qcow. Changes are
inside the qcow, thus rsync has
to sync the whole file every
time</div>
<div dir="auto"><br>
</div>
<div dir="auto">A little
workaround would be sharding, as
rsync has to sync only the
changed shards, but I don't
think this is a good solution</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Il 23 mar
2017 8:33 PM, "Joe Julian" <<a
moz-do-not-send="true"
href="mailto:joe@julianfamily.org"
target="_blank">joe@julianfamily.org</a>>
ha scritto:<br
type="attribution">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF"
text="#000000">
<p>In many cases, a full
backup set is just not
feasible. Georep to the
same or different DC may
be an option if the
bandwidth can keep up with
the change set. If not,
maybe breaking the data up
into smaller more
manageable volumes where
you only keep a smaller
set of critical data and
just back that up. Perhaps
an object store (swift?)
might handle fault
tolerance distribution
better for some workloads.</p>
<p>There's no one right
answer.</p>
<br>
<div
class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-cite-prefix">On
03/23/17 12:23, Gandalf
Corvotempesta wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Backing up
from inside each VM
doesn't solve the
problem
<div dir="auto">If you
have to backup 500VMs
you just need more
than 1 day and what if
you have to restore
the whole gluster
storage?</div>
<div dir="auto"><br>
</div>
<div dir="auto">How many
days do you need to
restore 1PB?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Probably
the only solution
should be a georep in
the same
datacenter/rack with a
similiar cluster, </div>
<div dir="auto">ready to
became the master
storage.</div>
<div dir="auto">In this
case you don't need to
restore anything as
data are already
there, </div>
<div dir="auto">only a
little bit back in
time but this double
the TCO</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Il
23 mar 2017 6:39 PM,
"Serkan Çoban" <<a
moz-do-not-send="true" href="mailto:cobanserkan@gmail.com"
target="_blank">cobanserkan@gmail.com</a>>
ha scritto:<br
type="attribution">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">Assuming
a backup window of
12 hours, you need
to send data at
25GB/s<br>
to backup solution.<br>
Using 10G Ethernet
on hosts you need at
least 25 host to
handle 25GB/s.<br>
You can create an EC
gluster cluster that
can handle this
rates, or<br>
you just backup
valuable data from
inside VMs using
open source backup<br>
tools like
borg,attic,restic ,
etc...<br>
<br>
On Thu, Mar 23, 2017
at 7:48 PM, Gandalf
Corvotempesta<br>
<<a
moz-do-not-send="true"
href="mailto:gandalf.corvotempesta@gmail.com" target="_blank">gandalf.corvotempesta@gmail.c<wbr>om</a>>
wrote:<br>
> Let's assume a
1PB storage full of
VMs images with each
brick over ZFS,<br>
> replica 3,
sharding enabled<br>
><br>
> How do you
backup/restore that
amount of data?<br>
><br>
> Backing up
daily is impossible,
you'll never finish
the backup that the<br>
> following one
is starting (in
other words, you
need more than 24
hours)<br>
><br>
> Restoring is
even worse. You need
more than 24 hours
with the whole
cluster<br>
> down<br>
><br>
> You can't rely
on ZFS snapshot due
to sharding (the
snapshot took from
one<br>
> node is useless
without all other
node related at the
same shard) and you<br>
> still have the
same restore speed<br>
><br>
> How do you
backup this?<br>
><br>
> Even georep
isn't enough, if you
have to restore the
whole storage in
case<br>
> of disaster<br>
><br>
>
______________________________<wbr>_________________<br>
> Gluster-users
mailing list<br>
> <a
moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> <a
moz-do-not-send="true"
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
rel="noreferrer"
target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br>
</blockquote>
</div>
</div>
<br>
<fieldset
class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_8694824072006468141m_2071367206087675765m_-8052554343169692798m_-3599642909736746536moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a></pre>
</blockquote>
</div>
______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a>
</blockquote></div></div>
</blockquote>
</div></blockquote></div></div>
</blockquote>
</div></blockquote></div></div>
<fieldset class="m_8694824072006468141mimeAttachmentHeader"></fieldset>
<pre>______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" class="m_8694824072006468141moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" class="m_8694824072006468141moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a></pre>
</blockquote>
<pre class="m_8694824072006468141moz-signature" cols="72">--
Alvin Starr || voice: <a moz-do-not-send="true" href="tel:%28905%29%20513-7688" value="+19055137688" target="_blank">(905)513-7688</a>
Netvel Inc. || Cell: <a moz-do-not-send="true" href="tel:%28416%29%20806-0133" value="+14168060133" target="_blank">(416)806-0133</a>
<a moz-do-not-send="true" class="m_8694824072006468141moz-txt-link-abbreviated" href="mailto:alvin@netvel.net" target="_blank">alvin@netvel.net</a> ||
</pre></div>
______________________________<wbr>_________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a>
</blockquote></div></div>
</blockquote>
<pre class="moz-signature" cols="72">--
Alvin Starr || voice: (905)513-7688
Netvel Inc. || Cell: (416)806-0133
<a class="moz-txt-link-abbreviated" href="mailto:alvin@netvel.net">alvin@netvel.net</a> ||
</pre></body></html>