[Gluster-devel] snapshot restore and USS

Thu Nov 27 09:29:43 UTC 2014

Hi,

With USS to access snapshots, we depend on last snapshot of the volume 
(or the latest snapshot) to resolve some issues.
Ex:
Say there is a directory called "dir" within the root of the volume and 
USS is enabled. Now when .snaps is accessed from "dir" (i.e. 
/dir/.snaps), first a lookup is sent on /dir which snapview-client 
xlator passes onto the normal graph till posix xlator of the brick. Next 
the lookup comes on /dir/.snaps. snapview-client xlator now redirects 
this call to the snap daemon (since .snaps is a virtual directory to 
access the snapshots). The lookup comes to snap daemon with parent gfid 
set to the gfid of "/dir" and the basename being set to ".snaps". Snap 
daemon will first try to resolve the parent gfid by trying to find the 
inode for that gfid. But since that gfid was not looked up before in the 
snap daemon, it will not be able to find the inode. So now to resolve 
it, snap daemon depends upon the latest snapshot. i.e. it tries to look 
up the gfid of /dir in the latest snapshot and if it can get the gfid, 
then lookup on /dir/.snaps is also successful.

But, there can be some confusion in the case of snapshot restore. Say 
there are 5 snapshots (snap1, snap2, snap3, snap4, snap5) for a volume 
vol. Now say the volume is restored to snap3. If there was a directory 
called
"/a" at the time of taking snap3 and was later removed, then after 
snapshot restore accessing .snaps from that directory (in fact all the 
directories which were present while taking snap3) might cause problems. 
Because now the original volume is nothing but the snap3 and snap daemon 
when gets the lookup on "/a/.snaps", it tries to find the gfid of "/a" 
in the latest snapshot (which is snap5) and if a was removed after 
taking snap3, then the lookup of "/a" in snap5 fails and thus the lookup 
of "/a/.snaps" will also fail.

Possible Solution:
One of the possible solution that can be helpful in this case is, 
whenever glusterd sends the list of snapshots to snap daemon after 
snapshot restore, send the list in such a way that the snapshot which is 
previous to the restored snapshot is sent as the latest snapshot (in the 
example above, since snap3 is restored, glusterd should send snap2 as 
the latest snapshot to snap daemon).

But in the above solution also, there is a problem. If there are only 2 
snapshots (snap1, snap2) and the volume is restored to the first 
snapshot (snap1), there is no previous snapshot to look at. And glusterd 
will send only one name in the list which is snap2 but it is in a future 
state than the volume.

A patch has been submitted for the review to handle this 
(http://review.gluster.org/#/c/9094/).
And in the patch because of the above confusions snapd tries to consult 
the adjacent snapshots  of the restored snapshot to resolve the gfids. 
As per the 5 snapshots example, it tries to look at snap2 and snap4 
(i.e. look into snap2 first, if it fails then look into snap4). If there 
is no previous snapshot, then look at the next snapshot (2 snapshots 
example). If there is no next snapshot, then look at the previous snapshot.

Please provide feed back about how this issue can be handled.

Regards,
Raghavendra Bhat