[Bugs] [Bug 1235964] Disperse volume: FUSE I/O error after self healing the failed disk files
bugzilla at redhat.com
bugzilla at redhat.com
Wed Aug 5 01:02:46 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1235964
Pranith Kumar K <pkarampu at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pkarampu at redhat.com
--- Comment #6 from Pranith Kumar K <pkarampu at redhat.com> ---
Fang Huang,
Thanks a ton for the test script.
Xavi,
I think I found the root cause for this problem. After the heal happens,
inode is still not able to update the 'bad' in inode-ctx because of which it
thinks enough good subvolumes are not present which is leading to EIO. Let's
talk about this today.
[2015-08-05 00:56:52.530667] E [MSGID: 122034]
[ec-common.c:546:ec_child_select] 0-patchy-disperse-0: Insufficient available
childs for this request (have 3, need 4)
I modified the script to umount and mount again so that inode-ctx will be
updated afresh and the test passes.
#!/bin/bash
. $(dirname $0)/../../include.rc
. $(dirname $0)/../../volume.rc
cleanup
ec_test_dir=$M0/test
function ec_test_generate_src()
{
mkdir -p $ec_test_dir
for i in `seq 0 19`
do
dd if=/dev/zero of=$ec_test_dir/$i.c bs=1024 count=2
done
}
function ec_test_make()
{
for i in `ls *.c`
do
file=`basename $i`
filename=${file%.*}
cp $i $filename.o
done
}
## step 1
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 disperse 7 redundancy 3 $H0:$B0/${V0}{0..6}
TEST $CLI volume start $V0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "7" ec_child_up_count $V0 0
## step 2
TEST ec_test_generate_src
cd $ec_test_dir
TEST ec_test_make
## step 3
TEST kill_brick $V0 $H0 $B0/${V0}0
TEST kill_brick $V0 $H0 $B0/${V0}1
EXPECT '5' online_brick_count
TEST rm -f *.o
TEST ec_test_make
## step 4
TEST $CLI volume start $V0 force
EXPECT '7' online_brick_count
# active heal
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "[0-9][0-9]*" get_shd_process_pid
TEST $CLI volume heal $V0 full
TEST rm -f *.o
TEST ec_test_make
## step 5
TEST kill_brick $V0 $H0 $B0/${V0}2
TEST kill_brick $V0 $H0 $B0/${V0}3
EXPECT '5' online_brick_count
cd -
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "5" ec_child_up_count $V0 0
cd -
TEST rm -f *.o
TEST ec_test_make
EXPECT '5' online_brick_count
## step 6
TEST $CLI volume start $V0 force
EXPECT '7' online_brick_count
cd -
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "7" ec_child_up_count $V0 0
cd -
# self-healing
TEST rm -f *.o
TEST ec_test_make
TEST pidof glusterd
EXPECT "$V0" volinfo_field $V0 'Volume Name'
EXPECT 'Started' volinfo_field $V0 'Status'
EXPECT '7' online_brick_count
## cleanup
cd
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST $CLI volume stop $V0
TEST $CLI volume delete $V0
TEST rm -rf $B0/*
cleanup;
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=m9ZzFKeveB&a=cc_unsubscribe
More information about the Bugs
mailing list