[Gluster-devel] GlusterFS Volume backup API

Thu Dec 18 17:08:20 UTC 2014

Hi,

Today we discussed about GlusterFS backup API, our plan is to provide a 
tool/api to get list of changed files(Full/incremental)

Participants: Me, Kotresh, Ajeet, Shilpa

Thanks to Paul Cuzner for providing inputs about pre and post hooks 
available in backup utilities like NetBackup.

*
**Initial draft:*
==============

Case 1 - Registered Consumer
----------------------------

Consumer application has to register by giving a session name.

glusterbackupapi register <sessionname> <host> <volume>

When the following command run for the first time, it will do full scan. 
next onwards it does incremental. Start time for incremental is last 
backup time, endtime will be current time.

glusterbackupapi <sessionname> --out-file=out.txt

--out-file is optional argument, default output file name is 
`output.txt`. Output file will have file paths.

Case 2 - Unregistered Consumer
-----------------------------

Start time and end time information will not be remembered, every time 
consumer has to send start time and end time if incremental.

For Full backup,

     glusterbackupapi full <host> <volume> --out-file=out.txt

For Incremental backup,

     glusterbackupapi inc <host> <volume> <STARTTIME> <ENDTIME> 
--out-file=out.txt

where STARTTIME and ENDTIME are in unix timestamp format.

*Technical overview*
==================
1. Using host and volume name arguments, it fetches volume info and 
volume status to get the list of up bricks/nodes.
2. Executes brick/node agent to get required details from brick. (TBD: 
communication via RPC/SSH/gluster system:: execute)
3. If full scan, brick/node agent will gets list of files from that 
brick backend and generates output file.
4. If incremental, it calls Changelog History API, gets distinct GFID's 
list and then converts each GFID to path.
5. Generated output files from each brick node will be copied to 
initiator node.
6. Merges all the output files from bricks and removes duplicates.
7. In case of session based access, session information will be saved by 
each brick/node agent.

*Issues/Challenges*
=================
1. If timestamp different in gluster nodes. We are assuming, in a 
cluster TS will remain same.
2. If a brick is down, how to handle? We are assuming, all the bricks 
should be up to initiate backup(atleast one from each replica)
3. If changelog not available, or broken in between start time and end 
time, then how to get the incremental files list. As a prerequisite, 
changelog should be enabled before backup.
4. GFID to path conversion, using `find -samefile` or using 
`glusterfs.pathinfo` xattr on aux-gfid-mount.
5. Deleted files, if we get GFID of a deleted file from changelog how to 
find path. Do backup api requires deleted files list?
6. Storing session info in each brick nodes.
7. Communication channel between nodes, RPC/SSH/gluster system:: 
execute... etc?

  Kotresh, Ajeet, Please add if I missed any points.

  --
  regards
  Aravinda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141218/5ddc3ae0/attachment.html>