[Gluster-devel] snapshot restore and USS

Vijaikumar M vmallika at redhat.com
Mon Dec 1 12:25:59 UTC 2014


On Monday 01 December 2014 05:36 PM, Raghavendra Bhat wrote:
> On Monday 01 December 2014 04:51 PM, Raghavendra G wrote:
>>
>>
>> On Fri, Nov 28, 2014 at 6:48 PM, RAGHAVENDRA TALUR 
>> <raghavendra.talur at gmail.com <mailto:raghavendra.talur at gmail.com>> wrote:
>>
>>     On Thu, Nov 27, 2014 at 2:59 PM, Raghavendra Bhat
>>     <rabhat at redhat.com <mailto:rabhat at redhat.com>> wrote:
>>     > Hi,
>>     >
>>     > With USS to access snapshots, we depend on last snapshot of the
>>     volume (or
>>     > the latest snapshot) to resolve some issues.
>>     > Ex:
>>     > Say there is a directory called "dir" within the root of the
>>     volume and USS
>>     > is enabled. Now when .snaps is accessed from "dir" (i.e.
>>     /dir/.snaps), first
>>     > a lookup is sent on /dir which snapview-client xlator passes
>>     onto the normal
>>     > graph till posix xlator of the brick. Next the lookup comes on
>>     /dir/.snaps.
>>     > snapview-client xlator now redirects this call to the snap
>>     daemon (since
>>     > .snaps is a virtual directory to access the snapshots). The
>>     lookup comes to
>>     > snap daemon with parent gfid set to the gfid of "/dir" and the
>>     basename
>>     > being set to ".snaps". Snap daemon will first try to resolve
>>     the parent gfid
>>     > by trying to find the inode for that gfid. But since that gfid
>>     was not
>>     > looked up before in the snap daemon, it will not be able to
>>     find the inode.
>>     > So now to resolve it, snap daemon depends upon the latest
>>     snapshot. i.e. it
>>     > tries to look up the gfid of /dir in the latest snapshot and if
>>     it can get
>>     > the gfid, then lookup on /dir/.snaps is also successful.
>>
>>     From the user point of view, I would like to be able to enter
>>     into the
>>     .snaps anywhere.
>>     To be able to do that, we can turn the dependency upside down,
>>     instead
>>     of listing all
>>     snaps in the .snaps dir, lets just show whatever snapshots had
>>     that dir.
>>
>>
>> Currently readdir in snap-view server is listing _all_ the snapshots. 
>> However if you try to do "ls" on a snapshot which doesn't contain 
>> this directory (say dir/.snaps/snap3), I think it returns 
>> ESTALE/ENOENT. So, to get what you've explained above, readdir(p) 
>> should filter out those snapshots which doesn't contain this 
>> directory (to do that, it has to lookup dir on each of the snapshots).
>>
>> Raghavendra Bhat explained the problem and also a possible solution 
>> to me in person. There are some pieces missing in the problem 
>> description as explained in the mail (but not in the discussion we 
>> had). The problem explained here occurs  when you restore a snapshot 
>> (say snap3) where the directory got created, but deleted before next 
>> snapshot. So, directory doesn't exist in snap2 and snap4, but exists 
>> only in snap3. Now, when you restore snap3, "ls" on dir/.snaps should 
>> show nothing. Now, what should be result of lookup (gfid-of-dir, 
>> ".snaps") should be?
>>
>> 1. we can blindly return a virtual inode, assuming there is atleast 
>> one snapshot contains dir. If fops come on specific snapshots (eg., 
>> dir/.snaps/snap4), they'll anyways fail with ENOENT (since dir is not 
>> present on any snaps).
>> 2. we can choose to return ENOENT if we figure out that dir is not 
>> present on any snaps.
>>
>> The problem we are trying to solve here is how to achieve 2. One 
>> simple solution is to lookup for <gfid-of-dir> on all the snapshots 
>> and if every lookup fails with ENOENT, we can return ENOENT. The 
>> other solution is to just lookup in snapshots before and after (if 
>> both are present, otherwise just in latest snapshot). If both fail, 
>> then we can be sure that no snapshots contain that directory.
>>
>> Rabhat, Correct me if I've missed out anything :).
>>
>
>
> If a readdir on .snaps entered from a non root directory has to show 
> the list of only those snapshots where the directory (or rather gfid 
> of the directory) is present, then the way to achieve will be bit costly.
>
> When readdir comes on .snaps entered from a non root directory (say ls 
> /dir/.snaps), following operations have to be performed
> 1) In a array we have the names of all the snapshots. So, do a 
> nameless lookup on the gfid of /dir on all the snapshots
> 2) Based on which snapshots have sent success to the above lookup, 
> build a new array or list of snapshots.
> 3) Then send the above new list as the readdir entries.
>
> But the above operation it costlier. Because, just to serve one 
> readdir request we have to make a lookup on each snapshot (if there 
> are 256 snapshots, then we have to make 256 lookup calls via network).
>
> One more thing is resource usage. As of now any snapshot will be 
> initied (i.e. via gfapi a connection is established with the 
> corresponding snapshot volume, which is equivalent to a mounted 
> volume.) when that snapshot is accessed (from fops point of view a 
> lookup comes on the snapshot entry, say "ls /dir/.snaps/snap1"). Now 
> to serve readdir all the snapshots will be  accessed and all the 
> snapshots are initialized. This means there can be 256 instances of 
> gfapi connections with each instance having its own inode table and 
> other resources). After readdir if a snapshot is not accessed, so many 
> resources of that snapshots will add up to the snap daemon's usage.
>
 From a NFS mount moint, if we do 'ls /dir/.snaps/', NFS client will 
send a stat on all the entries and only after this it returns to the 
application. This will also initiate gfapi for all the snapshots right?

Thanks,
Vijay

> With the above points in mind, I was thinking about different 
> approaches to handle this situation. We need latest snapshot (and as 
> per the patch, adjacent snapshots to handle restore) to resolve 
> lookups coming on .snaps. Mainly for resolving the parent gfid so that 
> we can look it up somewhere (if "ls /dir/.snaps is done, then lookup 
> comes with parent gfid set to gfid of /dir and name set to ".snaps". 
> But since /dir has not been looked up yet in snap daemon, it has to 
> first resolve parent gfid for which it looks at latest snapshot).
>
> What we can do is, while sending lookup on .snaps (again, say "ls 
> /dir/.snaps") within the dict add a key, which snapview-server can 
> look for. That key is kinda hint from snapview-client to the 
> snapview-server that the parent gfid of this particular lookup call 
> exists and valid one. When snapview-server gets lookup as part of 
> resolution from protocol/server on the parent gfid, it can look at the 
> dict for the key. If the key is set, then simply return success to 
> that lookup.
>
> With the above way we can handle many situations such as this:
> Entering .snaps from a directory which is created after taking the 
> latest snapshot.
>
> Please provide feedback on the above approach (the hint being set in 
> the dict).
>
> Regards,
> Raghavendra Bhat
>
>
>
>>
>>
>>     May be it is good enough if we resolve the parent on the main volume
>>     and rely on that
>>     in snapview client and server.
>>
>>     >
>>     > But, there can be some confusion in the case of snapshot
>>     restore. Say there
>>     > are 5 snapshots (snap1, snap2, snap3, snap4, snap5) for a
>>     volume vol. Now
>>     > say the volume is restored to snap3. If there was a directory
>>     called
>>     > "/a" at the time of taking snap3 and was later removed, then
>>     after snapshot
>>     > restore accessing .snaps from that directory (in fact all the
>>     directories
>>     > which were present while taking snap3) might cause problems.
>>     Because now the
>>     > original volume is nothing but the snap3 and snap daemon when
>>     gets the
>>     > lookup on "/a/.snaps", it tries to find the gfid of "/a" in the
>>     latest
>>     > snapshot (which is snap5) and if a was removed after taking
>>     snap3, then the
>>     > lookup of "/a" in snap5 fails and thus the lookup of
>>     "/a/.snaps" will also
>>     > fail.
>>
>>
>>     >
>>     > Possible Solution:
>>     > One of the possible solution that can be helpful in this case
>>     is, whenever
>>     > glusterd sends the list of snapshots to snap daemon after
>>     snapshot restore,
>>     > send the list in such a way that the snapshot which is previous
>>     to the
>>     > restored snapshot is sent as the latest snapshot (in the
>>     example above,
>>     > since snap3 is restored, glusterd should send snap2 as the
>>     latest snapshot
>>     > to snap daemon).
>>     >
>>     > But in the above solution also, there is a problem. If there
>>     are only 2
>>     > snapshots (snap1, snap2) and the volume is restored to the
>>     first snapshot
>>     > (snap1), there is no previous snapshot to look at. And glusterd
>>     will send
>>     > only one name in the list which is snap2 but it is in a future
>>     state than
>>     > the volume.
>>     >
>>     > A patch has been submitted for the review to handle this
>>     > (http://review.gluster.org/#/c/9094/).
>>     > And in the patch because of the above confusions snapd tries to
>>     consult the
>>     > adjacent snapshots  of the restored snapshot to resolve the
>>     gfids. As per
>>     > the 5 snapshots example, it tries to look at snap2 and snap4
>>     (i.e. look into
>>     > snap2 first, if it fails then look into snap4). If there is no
>>     previous
>>     > snapshot, then look at the next snapshot (2 snapshots example).
>>     If there is
>>     > no next snapshot, then look at the previous snapshot.
>>     >
>>     > Please provide feed back about how this issue can be handled.
>>     >
>>     > Regards,
>>     > Raghavendra Bhat
>>     > _______________________________________________
>>     > Gluster-devel mailing list
>>     > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>     > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>     --
>>     Raghavendra Talur
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> -- 
>> Raghavendra G
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141201/f2ffb22b/attachment-0001.html>


More information about the Gluster-devel mailing list