[Gluster-users] Backups

Gandalf Corvotempesta gandalf.corvotempesta at gmail.com
Thu Mar 23 20:29:40 UTC 2017


Yes but the biggest issue is how to recover
You'll need to recover the whole storage not a single snapshot and this can
last for days

Il 23 mar 2017 9:24 PM, "Alvin Starr" <alvin at netvel.net> ha scritto:

> For volume backups you need something like snapshots.
>
> If you take a snapshot A of a live volume L that snapshot stays at that
> moment in time and you can rsync that to another system or use something
> like deltacp.pl to copy it.
>
> The usual process is to delete the snapshot once its copied and than
> repeat the process again when the next backup is required.
>
> That process does require rsync/deltacp to read the complete volume on
> both systems which can take a long time.
>
> I was kicking around the idea to try and handle snapshot deltas better.
>
> The idea is that you could take your initial snapshot A then sync that
> snapshot to your backup system.
>
> At a later point you could take another snapshot B.
>
> Because snapshots contain the copies of the original data at the time of
> the snapshot and unmodified data points to the Live volume it is possible
> to tell what blocks of data have changed since the snapshot was taken.
>
> Now that you have a second snapshot you can in essence perform a diff on
> the A and B snapshots to get only the blocks that changed up to the time
> that B was taken.
>
> These blocks could be copied to the backup image and you should have a
> clone of the B snapshot.
>
> You would not have to read the whole volume image but just the changed
> blocks dramatically improving the speed of the backup.
>
> At this point you can delete the A snapshot and promote the B snapshot to
> be the A snapshot for the next backup round.
>
> On 03/23/2017 03:53 PM, Gandalf Corvotempesta wrote:
>
> Are backup consistent?
> What happens if the header on shard0 is synced referring to some data on
> shard450 and when rsync parse shard450 this data is changed by subsequent
> writes?
>
> Header would be backupped  of sync respect the rest of the image
>
> Il 23 mar 2017 8:48 PM, "Joe Julian" <joe at julianfamily.org> ha scritto:
>
>> The rsync protocol only passes blocks that have actually changed. Raw
>> changes fewer bits. You're right, though, that it still has to check the
>> entire file for those changes.
>>
>> On 03/23/17 12:47, Gandalf Corvotempesta wrote:
>>
>> Raw or qcow doesn't change anything about the backup.
>> Georep always have to sync the whole file
>>
>> Additionally, raw images has much less features than qcow
>>
>> Il 23 mar 2017 8:40 PM, "Joe Julian" <joe at julianfamily.org> ha scritto:
>>
>>> I always use raw images. And yes, sharding would also be good.
>>>
>>> On 03/23/17 12:36, Gandalf Corvotempesta wrote:
>>>
>>> Georep expose to another problem:
>>> When using gluster as storage for VM, the VM file is saved as qcow.
>>> Changes are inside the qcow, thus rsync has to sync the whole file every
>>> time
>>>
>>> A little workaround would be sharding, as rsync has to sync only the
>>> changed shards, but I don't think this is a good solution
>>>
>>> Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at julianfamily.org> ha scritto:
>>>
>>>> In many cases, a full backup set is just not feasible. Georep to the
>>>> same or different DC may be an option if the bandwidth can keep up with the
>>>> change set. If not, maybe breaking the data up into smaller more manageable
>>>> volumes where you only keep a smaller set of critical data and just back
>>>> that up. Perhaps an object store (swift?) might handle fault tolerance
>>>> distribution better for some workloads.
>>>>
>>>> There's no one right answer.
>>>>
>>>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>>>
>>>> Backing up from inside each VM doesn't solve the problem
>>>> If you have to backup 500VMs you just need more than 1 day and what if
>>>> you have to restore the whole gluster storage?
>>>>
>>>> How many days do you need to restore 1PB?
>>>>
>>>> Probably the only solution should be a georep in the same
>>>> datacenter/rack with a similiar cluster,
>>>> ready to became the master storage.
>>>> In this case you don't need to restore anything as data are already
>>>> there,
>>>> only a little bit back in time but this double the TCO
>>>>
>>>> Il 23 mar 2017 6:39 PM, "Serkan Çoban" <cobanserkan at gmail.com> ha
>>>> scritto:
>>>>
>>>>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>>>>> to backup solution.
>>>>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>>>>> You can create an EC gluster cluster that can handle this rates, or
>>>>> you just backup valuable data from inside VMs using open source backup
>>>>> tools like borg,attic,restic , etc...
>>>>>
>>>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>>>> <gandalf.corvotempesta at gmail.com> wrote:
>>>>> > Let's assume a 1PB storage full of VMs images with each brick over
>>>>> ZFS,
>>>>> > replica 3, sharding enabled
>>>>> >
>>>>> > How do you backup/restore that amount of data?
>>>>> >
>>>>> > Backing up daily is impossible, you'll never finish the backup that
>>>>> the
>>>>> > following one is starting (in other words, you need more than 24
>>>>> hours)
>>>>> >
>>>>> > Restoring is even worse. You need more than 24 hours with the whole
>>>>> cluster
>>>>> > down
>>>>> >
>>>>> > You can't rely on ZFS snapshot due to sharding (the snapshot took
>>>>> from one
>>>>> > node is useless without all other node related at the same shard)
>>>>> and you
>>>>> > still have the same restore speed
>>>>> >
>>>>> > How do you backup this?
>>>>> >
>>>>> > Even georep isn't enough, if you have to restore the whole storage
>>>>> in case
>>>>> > of disaster
>>>>> >
>>>>> > _______________________________________________
>>>>> > Gluster-users mailing list
>>>>> > Gluster-users at gluster.org
>>>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________ Gluster-users mailing
>>>> list Gluster-users at gluster.org http://lists.gluster.org/mailm
>>>> an/listinfo/gluster-users
>>>
>>> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Alvin Starr                   ||   voice: (905)513-7688 <(905)%20513-7688>
> Netvel Inc.                   ||   Cell:  (416)806-0133 <(416)%20806-0133>alvin at netvel.net              ||
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/fd8127e1/attachment.html>


More information about the Gluster-users mailing list