[Gluster-devel] snapshot restore and USS
Vijaikumar M
vmallika at redhat.com
Mon Dec 1 12:25:59 UTC 2014
On Monday 01 December 2014 05:36 PM, Raghavendra Bhat wrote:
> On Monday 01 December 2014 04:51 PM, Raghavendra G wrote:
>>
>>
>> On Fri, Nov 28, 2014 at 6:48 PM, RAGHAVENDRA TALUR
>> <raghavendra.talur at gmail.com <mailto:raghavendra.talur at gmail.com>> wrote:
>>
>> On Thu, Nov 27, 2014 at 2:59 PM, Raghavendra Bhat
>> <rabhat at redhat.com <mailto:rabhat at redhat.com>> wrote:
>> > Hi,
>> >
>> > With USS to access snapshots, we depend on last snapshot of the
>> volume (or
>> > the latest snapshot) to resolve some issues.
>> > Ex:
>> > Say there is a directory called "dir" within the root of the
>> volume and USS
>> > is enabled. Now when .snaps is accessed from "dir" (i.e.
>> /dir/.snaps), first
>> > a lookup is sent on /dir which snapview-client xlator passes
>> onto the normal
>> > graph till posix xlator of the brick. Next the lookup comes on
>> /dir/.snaps.
>> > snapview-client xlator now redirects this call to the snap
>> daemon (since
>> > .snaps is a virtual directory to access the snapshots). The
>> lookup comes to
>> > snap daemon with parent gfid set to the gfid of "/dir" and the
>> basename
>> > being set to ".snaps". Snap daemon will first try to resolve
>> the parent gfid
>> > by trying to find the inode for that gfid. But since that gfid
>> was not
>> > looked up before in the snap daemon, it will not be able to
>> find the inode.
>> > So now to resolve it, snap daemon depends upon the latest
>> snapshot. i.e. it
>> > tries to look up the gfid of /dir in the latest snapshot and if
>> it can get
>> > the gfid, then lookup on /dir/.snaps is also successful.
>>
>> From the user point of view, I would like to be able to enter
>> into the
>> .snaps anywhere.
>> To be able to do that, we can turn the dependency upside down,
>> instead
>> of listing all
>> snaps in the .snaps dir, lets just show whatever snapshots had
>> that dir.
>>
>>
>> Currently readdir in snap-view server is listing _all_ the snapshots.
>> However if you try to do "ls" on a snapshot which doesn't contain
>> this directory (say dir/.snaps/snap3), I think it returns
>> ESTALE/ENOENT. So, to get what you've explained above, readdir(p)
>> should filter out those snapshots which doesn't contain this
>> directory (to do that, it has to lookup dir on each of the snapshots).
>>
>> Raghavendra Bhat explained the problem and also a possible solution
>> to me in person. There are some pieces missing in the problem
>> description as explained in the mail (but not in the discussion we
>> had). The problem explained here occurs when you restore a snapshot
>> (say snap3) where the directory got created, but deleted before next
>> snapshot. So, directory doesn't exist in snap2 and snap4, but exists
>> only in snap3. Now, when you restore snap3, "ls" on dir/.snaps should
>> show nothing. Now, what should be result of lookup (gfid-of-dir,
>> ".snaps") should be?
>>
>> 1. we can blindly return a virtual inode, assuming there is atleast
>> one snapshot contains dir. If fops come on specific snapshots (eg.,
>> dir/.snaps/snap4), they'll anyways fail with ENOENT (since dir is not
>> present on any snaps).
>> 2. we can choose to return ENOENT if we figure out that dir is not
>> present on any snaps.
>>
>> The problem we are trying to solve here is how to achieve 2. One
>> simple solution is to lookup for <gfid-of-dir> on all the snapshots
>> and if every lookup fails with ENOENT, we can return ENOENT. The
>> other solution is to just lookup in snapshots before and after (if
>> both are present, otherwise just in latest snapshot). If both fail,
>> then we can be sure that no snapshots contain that directory.
>>
>> Rabhat, Correct me if I've missed out anything :).
>>
>
>
> If a readdir on .snaps entered from a non root directory has to show
> the list of only those snapshots where the directory (or rather gfid
> of the directory) is present, then the way to achieve will be bit costly.
>
> When readdir comes on .snaps entered from a non root directory (say ls
> /dir/.snaps), following operations have to be performed
> 1) In a array we have the names of all the snapshots. So, do a
> nameless lookup on the gfid of /dir on all the snapshots
> 2) Based on which snapshots have sent success to the above lookup,
> build a new array or list of snapshots.
> 3) Then send the above new list as the readdir entries.
>
> But the above operation it costlier. Because, just to serve one
> readdir request we have to make a lookup on each snapshot (if there
> are 256 snapshots, then we have to make 256 lookup calls via network).
>
> One more thing is resource usage. As of now any snapshot will be
> initied (i.e. via gfapi a connection is established with the
> corresponding snapshot volume, which is equivalent to a mounted
> volume.) when that snapshot is accessed (from fops point of view a
> lookup comes on the snapshot entry, say "ls /dir/.snaps/snap1"). Now
> to serve readdir all the snapshots will be accessed and all the
> snapshots are initialized. This means there can be 256 instances of
> gfapi connections with each instance having its own inode table and
> other resources). After readdir if a snapshot is not accessed, so many
> resources of that snapshots will add up to the snap daemon's usage.
>
From a NFS mount moint, if we do 'ls /dir/.snaps/', NFS client will
send a stat on all the entries and only after this it returns to the
application. This will also initiate gfapi for all the snapshots right?
Thanks,
Vijay
> With the above points in mind, I was thinking about different
> approaches to handle this situation. We need latest snapshot (and as
> per the patch, adjacent snapshots to handle restore) to resolve
> lookups coming on .snaps. Mainly for resolving the parent gfid so that
> we can look it up somewhere (if "ls /dir/.snaps is done, then lookup
> comes with parent gfid set to gfid of /dir and name set to ".snaps".
> But since /dir has not been looked up yet in snap daemon, it has to
> first resolve parent gfid for which it looks at latest snapshot).
>
> What we can do is, while sending lookup on .snaps (again, say "ls
> /dir/.snaps") within the dict add a key, which snapview-server can
> look for. That key is kinda hint from snapview-client to the
> snapview-server that the parent gfid of this particular lookup call
> exists and valid one. When snapview-server gets lookup as part of
> resolution from protocol/server on the parent gfid, it can look at the
> dict for the key. If the key is set, then simply return success to
> that lookup.
>
> With the above way we can handle many situations such as this:
> Entering .snaps from a directory which is created after taking the
> latest snapshot.
>
> Please provide feedback on the above approach (the hint being set in
> the dict).
>
> Regards,
> Raghavendra Bhat
>
>
>
>>
>>
>> May be it is good enough if we resolve the parent on the main volume
>> and rely on that
>> in snapview client and server.
>>
>> >
>> > But, there can be some confusion in the case of snapshot
>> restore. Say there
>> > are 5 snapshots (snap1, snap2, snap3, snap4, snap5) for a
>> volume vol. Now
>> > say the volume is restored to snap3. If there was a directory
>> called
>> > "/a" at the time of taking snap3 and was later removed, then
>> after snapshot
>> > restore accessing .snaps from that directory (in fact all the
>> directories
>> > which were present while taking snap3) might cause problems.
>> Because now the
>> > original volume is nothing but the snap3 and snap daemon when
>> gets the
>> > lookup on "/a/.snaps", it tries to find the gfid of "/a" in the
>> latest
>> > snapshot (which is snap5) and if a was removed after taking
>> snap3, then the
>> > lookup of "/a" in snap5 fails and thus the lookup of
>> "/a/.snaps" will also
>> > fail.
>>
>>
>> >
>> > Possible Solution:
>> > One of the possible solution that can be helpful in this case
>> is, whenever
>> > glusterd sends the list of snapshots to snap daemon after
>> snapshot restore,
>> > send the list in such a way that the snapshot which is previous
>> to the
>> > restored snapshot is sent as the latest snapshot (in the
>> example above,
>> > since snap3 is restored, glusterd should send snap2 as the
>> latest snapshot
>> > to snap daemon).
>> >
>> > But in the above solution also, there is a problem. If there
>> are only 2
>> > snapshots (snap1, snap2) and the volume is restored to the
>> first snapshot
>> > (snap1), there is no previous snapshot to look at. And glusterd
>> will send
>> > only one name in the list which is snap2 but it is in a future
>> state than
>> > the volume.
>> >
>> > A patch has been submitted for the review to handle this
>> > (http://review.gluster.org/#/c/9094/).
>> > And in the patch because of the above confusions snapd tries to
>> consult the
>> > adjacent snapshots of the restored snapshot to resolve the
>> gfids. As per
>> > the 5 snapshots example, it tries to look at snap2 and snap4
>> (i.e. look into
>> > snap2 first, if it fails then look into snap4). If there is no
>> previous
>> > snapshot, then look at the next snapshot (2 snapshots example).
>> If there is
>> > no next snapshot, then look at the previous snapshot.
>> >
>> > Please provide feed back about how this issue can be handled.
>> >
>> > Regards,
>> > Raghavendra Bhat
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>> --
>> Raghavendra Talur
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>> Raghavendra G
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141201/f2ffb22b/attachment-0001.html>
More information about the Gluster-devel
mailing list