[Bugs] [Bug 1235964] Disperse volume: FUSE I/O error after self healing the failed disk files

Wed Aug 5 01:02:46 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1235964

Pranith Kumar K <pkarampu at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pkarampu at redhat.com

--- Comment #6 from Pranith Kumar K <pkarampu at redhat.com> ---
Fang Huang,
      Thanks a ton for the test script.
Xavi,
     I think I found the root cause for this problem. After the heal happens,
inode is still not able to update the 'bad' in inode-ctx because of which it
thinks enough good subvolumes are not present which is leading to EIO. Let's
talk about this today.

[2015-08-05 00:56:52.530667] E [MSGID: 122034]
[ec-common.c:546:ec_child_select] 0-patchy-disperse-0: Insufficient available
childs for this request (have 3, need 4)
 I modified the script to umount and mount again so that inode-ctx will be
updated afresh and the test passes.

#!/bin/bash

. $(dirname $0)/../../include.rc
. $(dirname $0)/../../volume.rc

cleanup

ec_test_dir=$M0/test

function ec_test_generate_src()
{
   mkdir -p $ec_test_dir
   for i in `seq 0 19`
   do
      dd if=/dev/zero of=$ec_test_dir/$i.c bs=1024 count=2
   done
}

function ec_test_make()
{
   for i in `ls *.c`
   do
     file=`basename $i`
     filename=${file%.*}
     cp $i $filename.o
   done
}

## step 1
TEST glusterd
TEST pidof glusterd
TEST $CLI volume create $V0 disperse 7 redundancy 3 $H0:$B0/${V0}{0..6}
TEST $CLI volume start $V0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "7" ec_child_up_count $V0 0

## step 2
TEST ec_test_generate_src

cd $ec_test_dir
TEST ec_test_make

## step 3
TEST kill_brick $V0 $H0 $B0/${V0}0
TEST kill_brick $V0 $H0 $B0/${V0}1
EXPECT '5' online_brick_count

TEST rm -f *.o
TEST ec_test_make

## step 4
TEST $CLI volume start $V0 force
EXPECT '7' online_brick_count

# active heal
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "[0-9][0-9]*" get_shd_process_pid
TEST $CLI volume heal $V0 full

TEST rm -f *.o
TEST ec_test_make

## step 5
TEST kill_brick $V0 $H0 $B0/${V0}2
TEST kill_brick $V0 $H0 $B0/${V0}3
EXPECT '5' online_brick_count
cd -
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "5" ec_child_up_count $V0 0
cd -

TEST rm -f *.o
TEST ec_test_make

EXPECT '5' online_brick_count

## step 6
TEST $CLI volume start $V0 force
EXPECT '7' online_brick_count
cd -
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST glusterfs --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0
$M0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "7" ec_child_up_count $V0 0
cd -

# self-healing
TEST rm -f *.o
TEST ec_test_make

TEST pidof glusterd
EXPECT "$V0" volinfo_field $V0 'Volume Name'
EXPECT 'Started' volinfo_field $V0 'Status'
EXPECT '7' online_brick_count

## cleanup
cd
EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
TEST $CLI volume stop $V0
TEST $CLI volume delete $V0
TEST rm -rf $B0/*

cleanup;

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=m9ZzFKeveB&a=cc_unsubscribe