[Gluster-devel] Filesystem backed cache feature on GlusterFS

Angel clist at uah.es
Wed Jan 16 12:13:35 UTC 2008


Hi all
Some time ago I saw previous messages related to caching files locally on gluster as opposed to keep the on mem-based IO-Cache. 

I must admit a dont have for now enough coding skills to help so much achieving this, coding a new performance xlator from scratch is still
hard for me.. 

So i was just thinking about what features we would need to get this up and running and how get them with minor efforts.

I ve write a litlle memo about achieving this issue, perhaps someone can give it a try and comment about feasibility and costs of incorporate 
this into the current development schedules.

Perhaps elaborating this memo a bit more can allow us in a near future to get this, asuming devs have enough time and they agree on the correctness of 
such design.

This approach merely requires in short.
	One minor modification to the filter xlator.
	A new performance translator auto-prune.
	One or more new schedulers "prune-big", "prune-lru" etc..
	Minor (?) Modifications to afr xlator

Please tell me what you think about this proposal.


Regards Angel 
-- 
----------------------------
Clister UAH
----------------------------
-------------- next part --------------

TITLE:Proposal from local filesystem cache in glusterfs arquitecture
AUTOR: Angel Alvarez <clist at uah.es>

Requeriments for a local filesystem Cache with minimal code changes and improved modularity.

The purpose of this memo is to try to define a posible roadmap to a local filesystem backed cache built from
current gluster components with minimal changes and a consisten model with actual developments.

Details on propposed modificactions

1- Max available space control feature for filter or posic xlator
Definition:
This feature limits the available storage space intercepcting size calculations (statvfs?)

Implementation:
If underlaying module reports Size > Max the set Size = Max
Option A: Implemented as an aditional option to the posix storage translator
Option B: Implemented as an aditional option to the filter xlator

Benefits: 
	Improved Q&A for the rest of modules as we can now test behavior of upper modules on size constrains
	Control, of space export on every node.

2- Auto-prunning Xlator 

Definition:
The Auto-pruning Xlator show always show underlaying modules at maximun space available. 
Auto-pruning Xlator atop a posix brick of 50GB shows always 50GB of storage space available. 
When upper modules try to store file on this xlator and space is scarce then it tries to prune files based on policy. 
Policies can be like "Delete the biggest file on underlaying storage" and is implemented as a scheduler that controls deletion of files. 
You start storing files and this Xlator auto-prunes them to keep available storage. On reads return ERROR if file dont exist or act normaly otherwise.
upper modules dont complain about non existen file error on this volume

Benefits: ¡¡REAL UNLIMITED STORAGE!! This xlator never gives up on new file creations as it prune files as needed to acommodate new ones

Implementation:
Always return max file space available for upper modules requests
On reads proceed if posible else return error properly 
On creation or write prune files as needed to recover space when the underlaying storage gets filled.
Deletion candidates choosen be means of plugable schedulers (prune-big etc..). 
Always follow scheduler candidates if these files are not opened.
Other operations as usual


3 New schedulers definitions
 - Scheduler prune-big
Definition:
This scheduler maintains a list of the n biggest files. It update its internal info upon close file ops
Auto-prune module schedules this module to choose deletion victims on filesystem cache
- Scheduler prune-lru
Definition:
This scheduler maintains a list of least recently used files.
Auto-prune module schedules this module to choose deletion victims on filesystem cache

4- AFR modifications
Definition:
Actual AFR provides de mechanism to ensure proper replication of subvolumes.
AFR tries to favor local volume for reading.
New AFR know local volume is (auto pruning) cache. Dont complain for non existant files on local cache.
Validate local cache file with timestamps from remote volumes 


GLUSTER SETUP TO USE ONE LOCAL VOLUME AS FILESYSTEM BACKED CACHE

Server setup
##########################################################################
volume server-posix
  type storage/posix               # POSIX FS translator
  option directory /home/export    # Export this directory
end-volume

volume server
  type protocol/server
  option transport-type tcp/server       # For TCP/IP transport
  option bind-address <server ip>        # Default is to listen on all interfaces
  subvolumes server-posix
  option auth.ip.server-posix.allow *    # Allow access to "brick2" volume
end-volume

Client setup
###########################################################################

# REMOTE VOLUME ON SERVER
volume client
  type protocol/client
  option transport-type tcp/client       # for TCP/IP transport
  option remote-host <server ip>         # IP address of the remote brick
                                         # from server for each request
  option remote-subvolume server-posix   # name of the remote volume
end-volume

# LOCAL VOLUME TO USE AS TEMPORAL CACHE
volume temporal-local-posix
  type storage/posix                       # POSIX FS translator
  option directory /tmp/glusterfs-cache    # Export this directory
end-volume


# LIMIT CACHE SIZE TO 10GB Max
volume limited-size-temporal
  type features/filter
  subvolumes temporal-local-posix
  option maxsize 10GB
  option read-only no
end-volume

# AUTOPRUNE MODULE FOR CACHE MAINTENANCE
# prunes n biggest files to make room when needed
volume local-cache
   type performance/autoprune
   subvolumes limited-size-temporal
   scheduler prune-big:100   # remember 100 biggest files from prior close operations selkect victims from these...
end-volume

# AFR SPECIAL CASE  REMOTE+LOCAL USING LOCAL FOR READS WHEN POSIBLE
# replicates remote volume on local cache, dont try to self-heal missing cache files from cache volume
# read first from local if possible checking mod time on remote volumes
# dont complain about non existing files on cache
volume local-cache-example
  type cluster/afr
  option self-heal off                 # Dont try to self-heal cache-volume
  option read-subvolume local-cache    # Always try to read from this volume, validate from other volumes
  option volume-is-cache local-volume  # dont complain about non existent files
  subvolumes client-volume local-cache
end-volume


END OF MEMO


More information about the Gluster-devel mailing list