[Gluster-devel] GlusterFS Volume backup API
Aravinda
avishwan at redhat.com
Thu Dec 18 17:08:20 UTC 2014
Hi,
Today we discussed about GlusterFS backup API, our plan is to provide a
tool/api to get list of changed files(Full/incremental)
Participants: Me, Kotresh, Ajeet, Shilpa
Thanks to Paul Cuzner for providing inputs about pre and post hooks
available in backup utilities like NetBackup.
*
**Initial draft:*
==============
Case 1 - Registered Consumer
----------------------------
Consumer application has to register by giving a session name.
glusterbackupapi register <sessionname> <host> <volume>
When the following command run for the first time, it will do full scan.
next onwards it does incremental. Start time for incremental is last
backup time, endtime will be current time.
glusterbackupapi <sessionname> --out-file=out.txt
--out-file is optional argument, default output file name is
`output.txt`. Output file will have file paths.
Case 2 - Unregistered Consumer
-----------------------------
Start time and end time information will not be remembered, every time
consumer has to send start time and end time if incremental.
For Full backup,
glusterbackupapi full <host> <volume> --out-file=out.txt
For Incremental backup,
glusterbackupapi inc <host> <volume> <STARTTIME> <ENDTIME>
--out-file=out.txt
where STARTTIME and ENDTIME are in unix timestamp format.
*Technical overview*
==================
1. Using host and volume name arguments, it fetches volume info and
volume status to get the list of up bricks/nodes.
2. Executes brick/node agent to get required details from brick. (TBD:
communication via RPC/SSH/gluster system:: execute)
3. If full scan, brick/node agent will gets list of files from that
brick backend and generates output file.
4. If incremental, it calls Changelog History API, gets distinct GFID's
list and then converts each GFID to path.
5. Generated output files from each brick node will be copied to
initiator node.
6. Merges all the output files from bricks and removes duplicates.
7. In case of session based access, session information will be saved by
each brick/node agent.
*Issues/Challenges*
=================
1. If timestamp different in gluster nodes. We are assuming, in a
cluster TS will remain same.
2. If a brick is down, how to handle? We are assuming, all the bricks
should be up to initiate backup(atleast one from each replica)
3. If changelog not available, or broken in between start time and end
time, then how to get the incremental files list. As a prerequisite,
changelog should be enabled before backup.
4. GFID to path conversion, using `find -samefile` or using
`glusterfs.pathinfo` xattr on aux-gfid-mount.
5. Deleted files, if we get GFID of a deleted file from changelog how to
find path. Do backup api requires deleted files list?
6. Storing session info in each brick nodes.
7. Communication channel between nodes, RPC/SSH/gluster system::
execute... etc?
Kotresh, Ajeet, Please add if I missed any points.
--
regards
Aravinda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141218/5ddc3ae0/attachment.html>
More information about the Gluster-devel
mailing list