[Bugs] [Bug 1181048] lockless lookup cause disk to be kicked out

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 20 06:22:18 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1181048



--- Comment #17 from lidi <lidi at perabytes.com> ---
The above 2 types of test scenario are all hard to reproduce.I use another test
case that can reproduce the error something like 'Comment 5'.

I use latest master for this test.

1. create a disperse volume with 3 disks(all disks in the same virtual
machine), mount it to /cluster2/test
2. from console 1 execute 'for ((;;));do dd if=/dev/zero of=/cluster2/test/aa
bs=1M count=20;done'
3. kill one brick
4. from console 2 execute 'ls /cluster2/test' many times

Actual results:
sometimes console 2 report follows.

[root at node-1 ~]# ls /cluster2/test/
ls: cannot access /cluster2/test/aa: Input/output error
aa

Additional info:
[2015-01-20 05:41:48.881650] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)
[2015-01-20 05:41:48.882630] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)
[2015-01-20 05:41:48.883332] W [ec-combine.c:801:ec_combine_check]
0-test-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-01-20 05:41:48.883404] I [dht-layout.c:682:dht_layout_normalize]
0-test-dht: Found anomalies in /aa (gfid =
a9463c35-e6c2-4a54-9dfc-4c7d68c78096). Holes=1 overlaps=0
[2015-01-20 05:41:48.883517] W [fuse-resolve.c:147:fuse_resolve_gfid_cbk]
0-fuse: a9463c35-e6c2-4a54-9dfc-4c7d68c78096: failed to resolve (Input/output
error)
[2015-01-20 05:41:48.883547] E [fuse-bridge.c:809:fuse_getattr_resume]
0-glusterfs-fuse: 30887: GETATTR 15520940
(a9463c35-e6c2-4a54-9dfc-4c7d68c78096) resolution failed
[2015-01-20 05:41:49.212596] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)
[2015-01-20 05:41:49.217569] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)
[2015-01-20 05:41:49.624787] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)
[2015-01-20 05:41:50.491950] W [ec-common.c:397:ec_child_select]
0-test-disperse-0: Executing operation with some subvolumes unavailable (4)


Assume that the lookup was executed in some other operations like writev or
readv, I supposed it would cause the operation fail.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=GZgTkBQpKk&a=cc_unsubscribe


More information about the Bugs mailing list