[Gluster-devel] Implementing Flat Hierarchy for trashed files

Prashanth Pai ppai at redhat.com
Tue Aug 18 06:29:09 UTC 2015


----- Original Message -----
> From: "Anoop C S" <anoopcs at redhat.com>
> To: gluster-devel at gluster.org
> Sent: Monday, August 17, 2015 6:20:50 PM
> Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files
> 
> Hi all,
> 
> As we move forward, in order to fix the limitations with current trash
> translator we are planning to replace the existing criteria for trashed
> files inside trash directory with a general flat hierarchy as described
> in the following sections. Please have your thoughts on following
> design considerations.
> 
> Current implementation
> ======================
> * Trash translator resides on glusterfs server stack just above posix.
> * Trash directory (.trashcan) is created during volume start and is
>   visible under root of the volume.
> * Each trashed file is moved (renamed) to trash directory with an
>   appended time stamp in the file name.

Do these files get moved during re-balance due to name change or do you choose file name according to the DHT regex magic to avoid that ?

> * Exact directory hierarchy (w.r.t the root of volume) is maintained
>   inside trash directory whenever a file is deleted/truncated from a
>   directory
> 
> Outstanding issues
> ==================
> * Since renaming occurs at the server side, client-side is unaware of
>   trash doing rename or create operations.
> * As a result files/directories may not be visible from mount point.
> * Files/Directories created from from trash translator will not have
>   gfid associated with it until lookup is performed.
> 
> Proposed Flat hierarchy
> =======================
> * Instead of creating the whole directory under trash, we will rename
>   the file and place it directly under trash directory (of course with
>   appended time stamp).

The .trashcan directory might not scale with millions of such files placed under one directory. We had faced the same problem with gluster-swift project for object expiration feature and had decided to distribute our files across multiple directories in a deterministic way. And, personally, I'd prefer storing absolute timestamp, for example: as returned by `date +%s` command.

> * Directory hierarchy can be stored via either of the following two
>   approaches:
> 	(a) File name will contain the whole path with time stamp
> 	    appended

If this approach is taken, you might have trouble with choosing a "magic letter" representing slashes.

> 	(b) Store whole hierarchy as an xattr
> 
> Other enhancements
> ==================
> * Create the trash directory only
> when trash xlator is enabled.

This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions caused undesired results in gluster-swift integration because .trashcan was visible by default on all glusterfs volumes.

> * Operations such as unlink, rename etc
> will be prevented on trash
>   directory only when trash xlator is
> enabled.
> * A new trash helper translator on client side(loaded only when
> trash
>   is enabled) to resolve split brain issues with truncation of
> files.
> * Restore files from trash with the help of an explicit setfattr
> call.

You have to be very careful with races involved in re-creating the path when clients are accessing volume, also with over-writing if path exists.
It's way easier (from implementer's perspective) if this is a manual process.

> 
> Thanks & Regards,
> -Anoop C S
> -Jiffin Tony Thottan
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list