[Gluster-devel] snapshot restore and USS

Raghavendra Bhat rabhat at redhat.com
Mon Dec 1 12:06:42 UTC 2014


On Monday 01 December 2014 04:51 PM, Raghavendra G wrote:
>
>
> On Fri, Nov 28, 2014 at 6:48 PM, RAGHAVENDRA TALUR 
> <raghavendra.talur at gmail.com <mailto:raghavendra.talur at gmail.com>> wrote:
>
>     On Thu, Nov 27, 2014 at 2:59 PM, Raghavendra Bhat
>     <rabhat at redhat.com <mailto:rabhat at redhat.com>> wrote:
>     > Hi,
>     >
>     > With USS to access snapshots, we depend on last snapshot of the
>     volume (or
>     > the latest snapshot) to resolve some issues.
>     > Ex:
>     > Say there is a directory called "dir" within the root of the
>     volume and USS
>     > is enabled. Now when .snaps is accessed from "dir" (i.e.
>     /dir/.snaps), first
>     > a lookup is sent on /dir which snapview-client xlator passes
>     onto the normal
>     > graph till posix xlator of the brick. Next the lookup comes on
>     /dir/.snaps.
>     > snapview-client xlator now redirects this call to the snap
>     daemon (since
>     > .snaps is a virtual directory to access the snapshots). The
>     lookup comes to
>     > snap daemon with parent gfid set to the gfid of "/dir" and the
>     basename
>     > being set to ".snaps". Snap daemon will first try to resolve the
>     parent gfid
>     > by trying to find the inode for that gfid. But since that gfid
>     was not
>     > looked up before in the snap daemon, it will not be able to find
>     the inode.
>     > So now to resolve it, snap daemon depends upon the latest
>     snapshot. i.e. it
>     > tries to look up the gfid of /dir in the latest snapshot and if
>     it can get
>     > the gfid, then lookup on /dir/.snaps is also successful.
>
>     From the user point of view, I would like to be able to enter into the
>     .snaps anywhere.
>     To be able to do that, we can turn the dependency upside down, instead
>     of listing all
>     snaps in the .snaps dir, lets just show whatever snapshots had
>     that dir.
>
>
> Currently readdir in snap-view server is listing _all_ the snapshots. 
> However if you try to do "ls" on a snapshot which doesn't contain this 
> directory (say dir/.snaps/snap3), I think it returns ESTALE/ENOENT. 
> So, to get what you've explained above, readdir(p) should filter out 
> those snapshots which doesn't contain this directory (to do that, it 
> has to lookup dir on each of the snapshots).
>
> Raghavendra Bhat explained the problem and also a possible solution to 
> me in person. There are some pieces missing in the problem description 
> as explained in the mail (but not in the discussion we had). The 
> problem explained here occurs  when you restore a snapshot (say snap3) 
> where the directory got created, but deleted before next snapshot. So, 
> directory doesn't exist in snap2 and snap4, but exists only in snap3. 
> Now, when you restore snap3, "ls" on dir/.snaps should show nothing. 
> Now, what should be result of lookup (gfid-of-dir, ".snaps") should be?
>
> 1. we can blindly return a virtual inode, assuming there is atleast 
> one snapshot contains dir. If fops come on specific snapshots (eg., 
> dir/.snaps/snap4), they'll anyways fail with ENOENT (since dir is not 
> present on any snaps).
> 2. we can choose to return ENOENT if we figure out that dir is not 
> present on any snaps.
>
> The problem we are trying to solve here is how to achieve 2. One 
> simple solution is to lookup for <gfid-of-dir> on all the snapshots 
> and if every lookup fails with ENOENT, we can return ENOENT. The other 
> solution is to just lookup in snapshots before and after (if both are 
> present, otherwise just in latest snapshot). If both fail, then we can 
> be sure that no snapshots contain that directory.
>
> Rabhat, Correct me if I've missed out anything :).
>


If a readdir on .snaps entered from a non root directory has to show the 
list of only those snapshots where the directory (or rather gfid of the 
directory) is present, then the way to achieve will be bit costly.

When readdir comes on .snaps entered from a non root directory (say ls 
/dir/.snaps), following operations have to be performed
1) In a array we have the names of all the snapshots. So, do a nameless 
lookup on the gfid of /dir on all the snapshots
2) Based on which snapshots have sent success to the above lookup, build 
a new array or list of snapshots.
3) Then send the above new list as the readdir entries.

But the above operation it costlier. Because, just to serve one readdir 
request we have to make a lookup on each snapshot (if there are 256 
snapshots, then we have to make 256 lookup calls via network).

One more thing is resource usage. As of now any snapshot will be initied 
(i.e. via gfapi a connection is established with the corresponding 
snapshot volume, which is equivalent to a mounted volume.) when that 
snapshot is accessed (from fops point of view a lookup comes on the 
snapshot entry, say "ls /dir/.snaps/snap1"). Now to serve readdir all 
the snapshots will be  accessed and all the snapshots are initialized. 
This means there can be 256 instances of gfapi connections with each 
instance having its own inode table and other resources). After readdir 
if a snapshot is not accessed, so many resources of that snapshots will 
add up to the snap daemon's usage.

With the above points in mind, I was thinking about different approaches 
to handle this situation. We need latest snapshot (and as per the patch, 
adjacent snapshots to handle restore) to resolve lookups coming on 
.snaps. Mainly for resolving the parent gfid so that we can look it up 
somewhere (if "ls /dir/.snaps is done, then lookup comes with parent 
gfid set to gfid of /dir and name set to ".snaps". But since /dir has 
not been looked up yet in snap daemon, it has to first resolve parent 
gfid for which it looks at latest snapshot).

What we can do is, while sending lookup on .snaps (again, say "ls 
/dir/.snaps") within the dict add a key, which snapview-server can look 
for. That key is kinda hint from snapview-client to the snapview-server 
that the parent gfid of this particular lookup call exists and valid 
one. When snapview-server gets lookup as part of resolution from 
protocol/server on the parent gfid, it can look at the dict for the key. 
If the key is set, then simply return success to that lookup.

With the above way we can handle many situations such as this:
Entering .snaps from a directory which is created after taking the 
latest snapshot.

Please provide feedback on the above approach (the hint being set in the 
dict).

Regards,
Raghavendra Bhat



>
>
>     May be it is good enough if we resolve the parent on the main volume
>     and rely on that
>     in snapview client and server.
>
>     >
>     > But, there can be some confusion in the case of snapshot
>     restore. Say there
>     > are 5 snapshots (snap1, snap2, snap3, snap4, snap5) for a volume
>     vol. Now
>     > say the volume is restored to snap3. If there was a directory called
>     > "/a" at the time of taking snap3 and was later removed, then
>     after snapshot
>     > restore accessing .snaps from that directory (in fact all the
>     directories
>     > which were present while taking snap3) might cause problems.
>     Because now the
>     > original volume is nothing but the snap3 and snap daemon when
>     gets the
>     > lookup on "/a/.snaps", it tries to find the gfid of "/a" in the
>     latest
>     > snapshot (which is snap5) and if a was removed after taking
>     snap3, then the
>     > lookup of "/a" in snap5 fails and thus the lookup of "/a/.snaps"
>     will also
>     > fail.
>
>
>     >
>     > Possible Solution:
>     > One of the possible solution that can be helpful in this case
>     is, whenever
>     > glusterd sends the list of snapshots to snap daemon after
>     snapshot restore,
>     > send the list in such a way that the snapshot which is previous
>     to the
>     > restored snapshot is sent as the latest snapshot (in the example
>     above,
>     > since snap3 is restored, glusterd should send snap2 as the
>     latest snapshot
>     > to snap daemon).
>     >
>     > But in the above solution also, there is a problem. If there are
>     only 2
>     > snapshots (snap1, snap2) and the volume is restored to the first
>     snapshot
>     > (snap1), there is no previous snapshot to look at. And glusterd
>     will send
>     > only one name in the list which is snap2 but it is in a future
>     state than
>     > the volume.
>     >
>     > A patch has been submitted for the review to handle this
>     > (http://review.gluster.org/#/c/9094/).
>     > And in the patch because of the above confusions snapd tries to
>     consult the
>     > adjacent snapshots  of the restored snapshot to resolve the
>     gfids. As per
>     > the 5 snapshots example, it tries to look at snap2 and snap4
>     (i.e. look into
>     > snap2 first, if it fails then look into snap4). If there is no
>     previous
>     > snapshot, then look at the next snapshot (2 snapshots example).
>     If there is
>     > no next snapshot, then look at the previous snapshot.
>     >
>     > Please provide feed back about how this issue can be handled.
>     >
>     > Regards,
>     > Raghavendra Bhat
>     > _______________________________________________
>     > Gluster-devel mailing list
>     > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>     > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>     --
>     Raghavendra Talur
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> -- 
> Raghavendra G

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141201/8c53a239/attachment.html>


More information about the Gluster-devel mailing list