[Gluster-devel] Implementing Flat Hierarchy for trashed files

Anoop C S anoopcs at redhat.com
Tue Aug 18 12:34:21 UTC 2015


On Tue, 2015-08-18 at 02:29 -0400, Prashanth Pai wrote:
> ----- Original Message -----
> > From: "Anoop C S" <anoopcs at redhat.com>
> > To: gluster-devel at gluster.org
> > Sent: Monday, August 17, 2015 6:20:50 PM
> > Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed
> > files
> > 
> > Hi all,
> > 
> > As we move forward, in order to fix the limitations with current
> > trash
> > translator we are planning to replace the existing criteria for
> > trashed
> > files inside trash directory with a general flat hierarchy as
> > described
> > in the following sections. Please have your thoughts on following
> > design considerations.
> > 
> > Current implementation
> > ======================
> > * Trash translator resides on glusterfs server stack just above
> > posix.
> > * Trash directory (.trashcan) is created during volume start and is
> >   visible under root of the volume.
> > * Each trashed file is moved (renamed) to trash directory with an
> >   appended time stamp in the file name.
> 
> Do these files get moved during re-balance due to name change or do
> you choose file name according to the DHT regex magic to avoid that ?
> 

Actually we had put up http://review.gluster.org/#/c/9865/ for
addressing this issue. With the above change we can have this xattr set
on trashed files so as to mask those from rebalance process.

> > * Exact directory hierarchy (w.r.t the root of volume) is
> > maintained
> >   inside trash directory whenever a file is deleted/truncated from
> > a
> >   directory
> > 
> > Outstanding issues
> > ==================
> > * Since renaming occurs at the server side, client-side is unaware
> > of
> >   trash doing rename or create operations.
> > * As a result files/directories may not be visible from mount
> > point.
> > * Files/Directories created from from trash translator will not
> > have
> >   gfid associated with it until lookup is performed.
> > 
> > Proposed Flat hierarchy
> > =======================
> > * Instead of creating the whole directory under trash, we will
> > rename
> >   the file and place it directly under trash directory (of course
> > with
> >   appended time stamp).
> 
> The .trashcan directory might not scale with millions of such files
> placed under one directory. We had faced the same problem with
> gluster-swift project for object expiration feature and had decided
> to distribute our files across multiple directories in a
> deterministic way. And, personally, I'd prefer storing absolute
> timestamp, for example: as returned by `date +%s` command.
> 

In glusterfs we use strftime() library call for string formatting date
and time. We can use gf_timefmt_s format inside gluster which is a
wrapper for %s format exposed by strftime() lib call to get the number
of seconds since the Epoch. But the problem here is that is depends on
TZ(timezone). For more detailed explanation see the commit message fromhttp://review.gluster.org/#/c/11930/.

> > * Directory hierarchy can be stored via either of the following two
> >   approaches:
> > 	(a) File name will contain the whole path with time stamp
> > 	    appended
> 
> If this approach is taken, you might have trouble with choosing a 
> "magic letter" representing slashes.
> > 	(b) Store whole hierarchy as an xattr
> > 
> > Other enhancements
> > ==================
> > * Create the trash directory only
> > when trash xlator is enabled.
> 
> This is a needed enhancement. Upgrade to 3.7.* from older glusterfs
> versions caused undesired results in gluster-swift integration
> because .trashcan was visible by default on all glusterfs volumes.
> 
> > * Operations such as unlink, rename etc
> > will be prevented on trash
> >   directory only when trash xlator is
> > enabled.
> > * A new trash helper translator on client side(loaded only when
> > trash
> >   is enabled) to resolve split brain issues with truncation of
> > files.
> > * Restore files from trash with the help of an explicit setfattr
> > call.
> 
> You have to be very careful with races involved in re-creating the
> path when clients are accessing volume, also with over-writing if
> path exists.
> It's way easier (from implementer's perspective) if this is a manual
> process.
> 
> > 
> > Thanks & Regards,
> > -Anoop C S
> > -Jiffin Tony Thottan
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 


More information about the Gluster-devel mailing list