[Gluster-devel] 3.5.1qa4 performances

Thu Dec 26 05:47:32 UTC 2013

Emmanuel,
   There was a change in posix xlator where if the gfid itself is not present on the backend it is returning error ESTALE instead of ENOENT. With the information you gave, I checked the code and I see that the following code path is not handled. The following patch should fix it.

diff --git a/xlators/cluster/afr/src/afr-self-heald.c b/xlators/cluster/afr/src/afr-self-heald.c
index 8dbb9c6..259c049 100644
--- a/xlators/cluster/afr/src/afr-self-heald.c
+++ b/xlators/cluster/afr/src/afr-self-heald.c
@@ -541,7 +541,8 @@ _crawl_post_sh_action (xlator_t *this, loc_t *parent, loc_t *child,
         priv = this->private;
         shd  = &priv->shd;
         if (crawl_data->crawl == INDEX) {
-                if ((op_ret < 0) && (op_errno == ENOENT)) {
+                if ((op_ret < 0) &&
+                    ((op_errno == ENOENT) || (op_errno == ESTALE))) {
                         _remove_stale_index (this, crawl_data->readdir_xl,
                                              parent, uuid_utoa_r (child->gfid,
                                                                   gfid_str));


Is it possible for you to confirm that this is indeed the issue.

Pranith.

----- Original Message -----
> From: "Emmanuel Dreyfus" <manu at netbsd.org>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: gluster-devel at nongnu.org
> Sent: Thursday, December 26, 2013 11:09:36 AM
> Subject: Re: [Gluster-devel] 3.5.1qa4 performances
> 
> Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> 
> >     For some reason self-heal on those files must be failing. Wonder why?
> > Could you find what is the file on the bricks for each of those files
> > (using find -inum) and give me the getfattr output of those files and
> > their parent directories please.
> 
> Here is the setup:
> silo:/export/wd2a
> hangar:/export/wd1a
> hangar:/export/wd3a
> debacle:/export/wd1a
> 
> Example: hangar glustershd loops on:
> 
> [2013-12-26 05:24:17.113217] W
> [client-rpc-fops.c:1103:client3_3_getxattr_cbk] 0-gfs351-client-1:
> remote operation failed: Stale NFS file handle. Path:
> <gfid:cf1bdf4f-b71c-4fda-963d-b7e4547e1b7c>
> (cf1bdf4f-b71c-4fda-963d-b7e4547e1b7c). Key: glusterfs.gfid2path
> 
> I can find silo:/export/wd2a brick looping on
> [2013-12-26 05:25:39.441290] I [server-rpc-fops.c:154:server_lookup_cbk]
> 0-gfs351-server: 12899558: LOOKUP (null)
> (cf1bdf4f-b71c-4fda-963d-b7e4547e1b7c) ==> (Stale NFS file handle)
> 
> I search .glusterfs/cf/1b/cf1bdf4f-b71c-4fda-963d-b7e4547e1b7c on each
> bricks: it does not exist anywhere. I tried with other "Stale NFS file
> handle" messages, and the file never exists in glusterfs index tree.
> 
> 
> 
> 
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> manu at netbsd.org
>