[Gluster-devel] upstream: Symbolic link not getting healed
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Dec 19 02:28:22 UTC 2013
hi,
I used the following test to figure out the bad commit.
#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
function trigger_mount_self_heal {
find $M0 | xargs stat
}
cleanup;
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}{0,1}
TEST $CLI volume set $V0 cluster.background-self-heal-count 0
TEST $CLI volume start $V0
TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --use-readdirp=no --attribute-timeout=0 --entry-timeout=0
TEST touch $M0/a
TEST kill_brick $V0 $H0 $B0/${V0}0
TEST ln -s $M0/a $M0/s
TEST ! stat $B0/${V0}0/s
TEST stat $B0/${V0}1/s
TEST $CLI volume start $V0 force
EXPECT_WITHIN 20 "Y" glustershd_up_status
EXPECT_WITHIN 20 "1" afr_child_up_status_in_shd $V0 0
TEST $CLI volume heal $V0 full
TEST trigger_mount_self_heal
TEST stat $B0/${V0}0/s
TEST stat $B0/${V0}1/s
cleanup
According to git bisect run, the commit which introduced this problem is:
837422858c2e4ab447879a4141361fd382645406
commit 837422858c2e4ab447879a4141361fd382645406
Author: Anand Avati <avati at redhat.com>
Date: Thu Nov 21 06:48:17 2013 -0800
core: fix errno for non-existent GFID
When clients refer to a GFID which does not exist, the errno to
be returned in ESTALE (and not ENOENT). Even though ENOENT might
look "proper" most of the time, as the application eventually expects
ENOENT even if a parent directory does not exist, not returning
ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution
in uncached mode. This can result in spurious ENOENTs during
concurrent path modification operations.
Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936
BUG: 1032894
Signed-off-by: Anand Avati <avati at redhat.com>
Reviewed-on: http://review.gluster.org/6322
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Affected branches: master, 3.5, 3.4,
Will be working with Venkatesh to get a fix for this on all these branches.
Good catch venkatesh!!. Thanks a lot for a simple case to re-create the issue :-).
Vijay,
Do you think we need this patch for 3.4 as well? Did we get enough baking time? The change seems delicate. In the sense that all the places which are expecting ENOENT need to be carefully examined. Even if we miss one place, we have a potential bug.
Example:
In 3.4:
pk at pranithk-laptop - ~/workspace/gerrit-repo/xlators/cluster/afr/src ((detached from v3.4.0))
07:53:47 :) ⚡ git show 837422858c2e4ab447879a4141361fd382645406 --stat | grep afr <<--- On the commit which introduced this change only one afr file is changed
xlators/cluster/afr/src/afr-self-heald.c | 2 +-
Where as there are quite a few files which are handling ENOENT in afr:
pk at pranithk-laptop - ~/workspace/gerrit-repo/xlators/cluster/afr/src ((detached from v3.4.0))
07:53:51 :) ⚡ git grep -l ENOENT
afr-common.c
afr-self-heal-common.c
afr-self-heal-entry.c
afr-self-heald.c
Pranith
----- Original Message -----
> From: "Venkatesh Somyajulu" <vsomyaju at redhat.com>
> To: gluster-devel at nongnu.org
> Sent: Tuesday, December 17, 2013 4:14:57 PM
> Subject: [Gluster-devel] upstream: Symbolic link not getting healed
>
> Hi,
>
> For the upstream master branch, I found that symbolic link is not getting
> healed.
>
> How I reproduced:
> -----------------
> 1. Created replicate volume with 2 bricks in a replica.
> 2. Created file from the mount point.
> 3. Killed one of the brick of replica.
> 4. Created symbolic link to that file from mount point and then brought the
> killed brick back up.
>
> Tried to heal by both a) Mount Process and b) Self heal Daemon
>
> a) When self heal daemon is off:
> 6. Gave ls at mount point.
> Observation: Rather than file getting healed, getting this output.
> ls: cannot read symbolic link Link: No such file or directory
> File Link
>
> b) When self heal daemon is on:
> 6. "gluster volume heal volumename full" fails to heal and the output
> includes:
> [2013-12-17 10:21:49.863960] I
> [afr-self-heal-entry.c:1502:afr_sh_entry_impunge_readlink_sink_cbk]
> 0-volume1-replicate-0: readlink of /Link on volume1-client-1 failed
> (Stale file handle)
>
>
> Still root causing the issue. Seems link ESTALE error needs to be handled
> properly.
>
> Regards,
> Venkatesh
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list