[Gluster-devel] lookup caching

Raghavendra G raghavendra at gluster.com
Mon Apr 12 03:39:27 UTC 2010


On Sun, Apr 11, 2010 at 12:42 PM, Olivier Le Cam <
Olivier.LeCam at crdp.ac-versailles.fr> wrote:

> Hi -
>
>
> Raghavendra G wrote:
>
>  you can do that by sending the cached stats (here stat of file, stat of
>> parent directory) through STACK_UNWIND.
>>
>> STACK_UNWIND_STRICT (lookup, frame, 0, 0, loc->inode, cached_stat, NULL,
>> parent_stat);
>>
>> you can look into libglusterfs/src/defaults.c for default definitions of
>> each of fop (file operations) and their call backs.
>>
>
> Thank you. I have been able to get a (quick and dirty) stats lookups
> caching translator working. I still not well understand everything with the
> GlusterFS internal library, most of the caching job is done by my own code.
>
> Anyway, it is enough at this step to make some benchmarkings with and to
> see if it is possible to improve performances significantly enough.
>
> My first impression is quite mitigate. I can indeed see some improvements
> accessing small files: stats caching does its job. But for some reason,
> io-cache still talks with the servers before delivering a file, even if that
> file is available in its cache.
>
> I can see three protocol calls:
>        - client_open()         (to both servers)
>
       - client_stat()         (to one server only: load balancing?)
>        - client_flush()        (to both servers)
>
>
io-cache indeed sends open and flush to server. This is needed for correct
working of io-cache. As you've told below, since you are overriding lookup
call, stat is sent to server (stat and lookup are two different calls).



> This might be a problem with the implementation of my translator which only
> override "lookup" calls for now.
>
> My source code is attached (please be soft on it: as said before, it's a
> quick and dirty hack of the rot-13 translator).
>
> I'd like to get rid of any traffic with the servers when the file is
> available in the io-cache. That way I could really see if such a translator
> can be of any interest.
>
> Do you have any idea for achieving this goal?
>
>
> Thanks and best regards,
> --
> Olivier
>


since lookup is being unwound if the stat is cached for an inode, you've to
also implement calls like unlink, rmdir (which deletes files/directories)
and flush the cache corresponding to the inode. Otherwise lookup will be
succeeding even for unlinked files, but the actual operations (like
open/chmod/chown etc) will fail.

You should also handle calls which can change the stat of a file or
directory (like write/chmod/chown). As a simple implementation you can just
flush the cache.


>
> /*
>  Copyright (c) 2006-2009 Gluster, Inc. <http://www.gluster.com>
>  This file is part of GlusterFS.
>
>  GlusterFS is free software; you can redistribute it and/or modify
>  it under the terms of the GNU General Public License as published
>  by the Free Software Foundation; either version 3 of the License,
>  or (at your option) any later version.
>
>  GlusterFS is distributed in the hope that it will be useful, but
>  WITHOUT ANY WARRANTY; without even the implied warranty of
>  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>  General Public License for more details.
>
>  You should have received a copy of the GNU General Public License
>  along with this program.  If not, see
>  <http://www.gnu.org/licenses/>.
> */
>
> #include <ctype.h>
> #include <sys/uio.h>
>
> #ifndef _CONFIG_H
> #define _CONFIG_H
> #include "config.h"
> #endif
>
> #include "glusterfs.h"
> #include "xlator.h"
> #include "logging.h"
> #include <sys/time.h>
>
> #include "rot-13.h"
>
> /*
>  * This is a rot13 ``encryption'' xlator. It rot13's data when
>  * writing to disk and rot13's it back when reading it.
>  * This xlator is meant as an example, NOT FOR PRODUCTION
>  * USE ;) (hence no error-checking)
>  */
>
> mdc_inode_cache_t *
> mdc_inode_cache_delete(mdc_private_t *priv, mdc_inode_cache_t *cache)
> {
>        mdc_inode_cache_t *next = cache->next;
>
>        if (cache->previous)
>                cache->previous->next = cache->next;
>        if (cache->next)
>                cache->next->previous = cache->previous;
>        FREE (cache);
>
>        priv->count--;
>        return next;
> }
>
> int32_t
> mdc_inode_cache_set(xlator_t *this, ino_t ino, const struct stat *stbuf,
> const struct stat *postparent)
> {
>        mdc_private_t           *priv   = (mdc_private_t*) this->private;
>        mdc_inode_cache_t       *cache  = priv->inode_cache_head[ino %
> HASH_POS];
>        mdc_inode_cache_t       *new    = NULL;
>
>        if (ino == 0 || stbuf == NULL || postparent == NULL)
>                return 0;
>
>        if (cache->next) {
>                do {
>                        cache = cache->next;
>                        if (cache->ino == ino) {
>

instead of just returning, you can choose to update the cached stat with the
one passed as argument to this procedure.


>                                return 0;       /* already in */
>                        }
>                } while(cache->next);
>        }
>
>        new = CALLOC (sizeof(mdc_inode_cache_t), 1);
>        if (new == NULL) {
>                return -1;
>        }
>
>        new->ino = ino;
>        memcpy(&(new->stbuf), stbuf, sizeof(struct stat));
>        memcpy(&(new->postparent), postparent, sizeof(struct stat));
>        gettimeofday (&(new->tv), NULL);
>        new->previous = cache;
>        new->next = NULL;
>
>        cache->next = new;
>        priv->count++;
>
>        return 0;
> }
>
> mdc_inode_cache_t *
> mdc_inode_cache_get(xlator_t *this, ino_t ino)
> {
>        mdc_private_t           *priv = (mdc_private_t*) this->private;
>        mdc_inode_cache_t       *cache = priv->inode_cache_head[ino %
> HASH_POS];
>        struct timeval          now = {0,};
>        time_t                  timeout = 0;
>
>        if (ino == 0)
>                return NULL;
>
>        gettimeofday(&now, NULL);
>        timeout = now.tv_sec - priv->cache_timeout;
>
>        while (cache) {
>                if (cache->tv.tv_sec < timeout && cache->ino) {
>                        cache = mdc_inode_cache_delete (priv, cache);
>                        continue;
>                }
>                if (cache->ino == ino) {
>                        return cache;
>                }
>                cache = cache->next;
>        }
>
>        return NULL;
> }
>
> int32_t
> mdc_lookup_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>                int32_t op_ret, int32_t op_errno, inode_t *inode,
>                struct stat *stbuf, dict_t *dict, struct stat *postparent)
> {
>        // char *path;
>        // inode_path(inode, NULL, &path);
>
>        if (inode == NULL)
>                goto out;
>
>        if (stbuf && stbuf->st_ino) {
>                uint32_t ret;
>
>                ret = mdc_inode_cache_set(this, stbuf->st_ino, stbuf,
> postparent);
>                if (ret != 0) {
>                         gf_log (this->name, GF_LOG_WARNING,
>                                "Could not cache metadata (ino=%"PRIu64")",
>                                inode->ino);
>                }
>        }
>
> out :
>        STACK_UNWIND_STRICT (lookup, frame, op_ret, op_errno, inode, stbuf,
> dict,
>                postparent);
> }
>
> int32_t
> mdc_lookup (call_frame_t *frame, xlator_t *this, loc_t *loc,
>            dict_t *xattr_req)
> {
>        mdc_inode_cache_t       *cache          = NULL;
>
>        if (loc == NULL || loc->inode == NULL) {
>                goto out;
>        }
>
>        cache = mdc_inode_cache_get(this, loc->inode->ino);
>
>        if (cache) {
>                STACK_UNWIND_STRICT (lookup, frame, 0, 0, loc->inode,
>                        &cache->stbuf, NULL, &cache->postparent);
>                return 0;
>        }
>
> out :
>        STACK_WIND (frame, mdc_lookup_cbk, FIRST_CHILD (this),
>                FIRST_CHILD (this)->fops->lookup, loc, xattr_req);
>        return 0;
> }
>
> int32_t
> init (xlator_t *this)
> {
>        int             i       = 0;
>        data_t          *data   = NULL;
>        mdc_private_t   *priv   = NULL;
>
>        if (!this->children || this->children->next) {
>                gf_log ("mdc-cache", GF_LOG_ERROR,
>                        "FATAL: mdc-cache should have exactly one child");
>                return -1;
>        }
>
>        if (!this->parents) {
>                gf_log (this->name, GF_LOG_WARNING,
>                        "dangling volume. check volfile ");
>        }
>
>        priv = CALLOC (sizeof (mdc_private_t), 1);
>        ERR_ABORT (priv);
>        LOCK_INIT (&priv->lock);
>
>        for (i = 0; i < HASH_POS; i++) {
>                priv->inode_cache_head[i] = CALLOC (sizeof
> (mdc_inode_cache_t), 1);
>                if (priv->inode_cache_head[i]) {
>                        priv->inode_cache_head[i]->ino = 0;
>                        priv->inode_cache_head[i]->previous = NULL;
>                        priv->inode_cache_head[i]->next = NULL;
>                }
>        }
>
>        priv->cache_timeout = 1;
>        data = dict_get (this->options, "cache-timeout");
>        if (data) {
>                priv->cache_timeout = data_to_uint32 (data);
>                gf_log (this->name, GF_LOG_TRACE,
>                        "Using %d seconds to revalidate cache",
>                        priv->cache_timeout);
>        }
>
>        priv->count = 0;
>        this->private = priv;
>
>        gf_log ("mdc-cache", GF_LOG_WARNING, "metadata caching (mdc-cache)
> xlator loaded");
>        return 0;
> }
>
> void
> fini (xlator_t *this)
> {
>        mdc_private_t *priv = this->private;
>
>        FREE (priv);
>
>        return;
> }
>
> struct xlator_fops fops = {
>        .lookup       = mdc_lookup
> };
>
> struct xlator_mops mops = {
> };
>
> struct xlator_cbks cbks = {
> };
>
> struct volume_options options[] = {
>        { .key  = {"cache-timeout"},
>          .type = GF_OPTION_TYPE_INT,
>          .min  = 1,
>          .max  = 900
>        },
>        { .key  = {NULL} }
> };
>
> /*
>   Copyright (c) 2006-2009 Gluster, Inc. <http://www.gluster.com>
>   This file is part of GlusterFS.
>
>   GlusterFS is free software; you can redistribute it and/or modify
>   it under the terms of the GNU General Public License as published
>   by the Free Software Foundation; either version 3 of the License,
>   or (at your option) any later version.
>
>   GlusterFS is distributed in the hope that it will be useful, but
>   WITHOUT ANY WARRANTY; without even the implied warranty of
>   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>   General Public License for more details.
>
>   You should have received a copy of the GNU General Public License
>   along with this program.  If not, see
>   <http://www.gnu.org/licenses/>.
> */
>
> #ifndef __ROT_13_H__
> #define __ROT_13_H__
>
> #ifndef _CONFIG_H
> #define _CONFIG_H
> #include "config.h"
> #endif
>
> #include <sys/uio.h>
> #include "call-stub.h"
>
> #define HASH_POS        1699
>
> struct mdc_inode_cache {
>        ino_t           ino;
>        struct stat     stbuf;
>        struct stat     postparent;
>        struct timeval  tv;
>        struct mdc_inode_cache  *previous;
>        struct mdc_inode_cache  *next;
> };
> typedef struct mdc_inode_cache mdc_inode_cache_t;
>
> struct mdc_private {
>        uint32_t        cache_timeout;
>        uint32_t        max_entries;
>        uint32_t        count;
>        struct mdc_inode_cache *inode_cache_head[HASH_POS];
>        gf_lock_t       lock;
> };
> typedef struct mdc_private mdc_private_t;
>
> #endif /* __ROT_13_H__ */
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>

regards,
-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20100412/85655c79/attachment-0003.html>


More information about the Gluster-devel mailing list