[Gluster-devel] GlusterFS Volume backup API

Fri Dec 19 10:16:57 UTC 2014

Replies inline JOE>>

----- Original Message -----
From: "Aravinda" <avishwan at redhat.com>
To: "Joseph Fernandes" <josferna at redhat.com>
Cc: "gluster Devel" <gluster-devel at gluster.org>
Sent: Friday, December 19, 2014 3:39:28 PM
Subject: Re: [Gluster-devel] GlusterFS Volume backup API

Thanks Joseph, added comments inline.

On 12/19/2014 10:23 AM, Joseph Fernandes wrote:
> Few concerns inline JOE>>
>
> ----- Original Message -----
> From: "Aravinda" <avishwan at redhat.com>
> To: "gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, December 18, 2014 10:38:20 PM
> Subject: [Gluster-devel] GlusterFS Volume backup API
>
> Hi,
>
>
> Today we discussed about GlusterFS backup API, our plan is to provide a tool/api to get list of changed files(Full/incremental)
>
> Participants: Me, Kotresh, Ajeet, Shilpa
>
> Thanks to Paul Cuzner for providing inputs about pre and post hooks available in backup utilities like NetBackup.
>
>
> Initial draft:
> ==============
>
> Case 1 - Registered Consumer
> ----------------------------
>
> Consumer application has to register by giving a session name.
>
> glusterbackupapi register <sessionname> <host> <volume>
>
>
>
> When the following command run for the first time, it will do full scan. next onwards it does incremental. Start time for incremental is last backup time, endtime will be current time.
>
> glusterbackupapi <sessionname> --out-file=out.txt
>
> --out-file is optional argument, default output file name is `output.txt`. Output file will have file paths.
>
>
>
> Case 2 - Unregistered Consumer
> -----------------------------
>
> Start time and end time information will not be remembered, every time consumer has to send start time and end time if incremental.
>
> For Full backup,
>
> glusterbackupapi full <host> <volume> --out-file=out.txt
>
> For Incremental backup,
>
> glusterbackupapi inc <host> <volume> <STARTTIME> <ENDTIME> --out-file=out.txt
>
> where STARTTIME and ENDTIME are in unix timestamp format.
>
>
> Technical overview
> ==================
> 1. Using host and volume name arguments, it fetches volume info and volume status to get the list of up bricks/nodes.
> 2. Executes brick/node agent to get required details from brick. (TBD: communication via RPC/SSH/gluster system:: execute)
> 3. If full scan, brick/node agent will gets list of files from that brick backend and generates output file.
> 4. If incremental, it calls Changelog History API, gets distinct GFID's list and then converts each GFID to path.
> 5. Generated output files from each brick node will be copied to initiator node.
> 6. Merges all the output files from bricks and removes duplicates.
> 7. In case of session based access, session information will be saved by each brick/node agent.
>
>
> Issues/Challenges
> =================
> 1. If timestamp different in gluster nodes. We are assuming, in a cluster TS will remain same.
> 2. If a brick is down, how to handle? We are assuming, all the bricks should be up to initiate backup(atleast one from each replica)
> 3. If changelog not available, or broken in between start time and end time, then how to get the incremental files list. As a prerequisite, changelog should be enabled before backup.
>
> JOE >> Performance overhead on IO path when changelog is switched on. I think getting numbers or a performance matrix here would be very crucial,
> as its not desirable to sacrifice on File IO performance to support Backup API or any data maintenance activity.
We are also evaluating using FS crawl even for incremental instead of 
changelog.

            JOE >> BIG NO! to FS Crawl as it will is the worse thing for a spindle based storage. Your performance will deteriorate more. 

>
> 4. GFID to path conversion, using `find -samefile` or using `glusterfs.pathinfo` xattr on aux-gfid-mount.
> 5. Deleted files, if we get GFID of a deleted file from changelog how to find path. Do backup api requires deleted files list?
>
> JOE >>
> 1) "find" would not be a good option here as you have to traverse through the whole namespace. Takes a toll on the spindle based media.
> 2) "glusterfs.pathinfo" xattr is a feasible approach but has its own problems,
>      a. This xattr comes only with quota, So you need to decouple it from quota.
>      b. This xattr should be enabled from the beginning of namespace i.e if enable later you will some file which will
>         have this xattr and some which wont have it. This issue is true for any meta storing approach in gluster for eg : DB, Changelog etc
>      c. I am not sure if this xattr has a support for multiple had links. I am not sure if you (the backup scenario) would require it or not.
>         Just food for thought.
>      d. This xattr is not crash consistent with power failures. That means you may be in a state where few inodes will have the xattr and few won't.
Makes sense. These points came out of initial discussion, have to figure 
out how to get path efficiently using GFID. (If we use Multithreaded FS 
crawl even for incremental, then we don't need this conversion step.)

            JOE >> comment same as previous. Even if you do a multi-thread crawl you will be having your multi-threads contesting for spindle movements !

> 3) Agree with the delete problem. This problem gets worse with multiple hard links. If some hard links are recorded and few are not recorded.
>
> 6. Storing session info in each brick nodes.
> 7. Communication channel between nodes, RPC/SSH/gluster system:: execute... etc?
>
>
> Kotresh, Ajeet, Please add if I missed any points.
>
>

--
regards
Aravinda