[Gluster-users] Issue for replicate translator
Zhuo Yin
zhuoyin at gmail.com
Fri Oct 23 21:05:23 UTC 2009
Hi, All:
I've met a problem when doing unit test for replicate translator. And this
essential problem has bothered me for 2 weeks.
My setup is 4 copies in 4 machines (A,B,C,D), all A,B,C,D are acting both
server and client(the mount point is /home/),
and there is another machine E, which doesn't has any copy, just act as
purely client and utilize all A,B,C,D's disks
my simulate failure strategy is:
Step 1. randomly choose 1 machine, fail the NIC glusterfs is listen on(there
are still 3 copies on-line)
Step 2. sleep for a while (like 60 seconds)
Step 3. bring up the NIC I failed before (there are 4 copies on-line right
now)
Step 4. do "ls -laR /home" on the failed machine before
Step 5. goto step 1
Simultaneously, I'm also doing `ls -laR /home > test_result.txt` on the
machine E for times.
I've found the problems like:
1. missing files in directory or duplicate name in the same directory in the
ls output, like:
see the small part of different output of `ls -laR`, this is the vimdiff
output.
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file84 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file83
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file84 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file84
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file85 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file85
----------------------------------------------------------------------------------|
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file86
----------------------------------------------------------------------------------|
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file87
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file88 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file88
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file89 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file89
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file89 |
----------------------------------------------------------------------------------
-rw-r--r-- 1 root root 5004454 2009-10-16 19:20
file9 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:20 file9
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file90 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file90
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file91 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file91
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file91 |
----------------------------------------------------------------------------------
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file92 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file92
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file93 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file93
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file94 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file94
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file95 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file95
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file97 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file96
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file97 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file97
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file98 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file98
-rw-r--r-- 1 root root 5004454 2009-10-16 19:21
file99 | -rw-r--r-- 1 root root 5004454
2009-10-16 19:21 file99
There are apparently some files are missing and duplicate name, all are
in the same directory.
2. occasionally, the ls reports:
ls: reading directory /home/dir1/dir21: File descriptor in bad state
I really want guys can solve this basic and essential problem
The glusterfsd.vol I'm using for all 5 machines is:
=========================================================================================
# THIS IS THE SERVER-END CONFIGURATION
# Brick 1
volume posix
type storage/posix
option directory /mnt/disk1
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
option thread-count 16
subvolumes locks
end-volume
# Server
volume server
type protocol/server
option transport-type tcp/server
option transport.socket.bind-address `ifconfig -a | grep
"10.106.105." | awk '{print $2}' | awk 'BEGIN {FS=":"};{print $2}'`
option transport.socket.listen-port 6996
subvolumes brick
option auth.addr.brick.allow *
end-volume
# SERVER-END CONFIGURATION ENDS
# THIS IS THE CLIENT-END CONFIGURATION
# 3 Disks Machines
# Machine 1
volume cbrick1
type protocol/client
option transport-type tcp
option remote-host 10.106.105.150
option remote-port 6996
option remote-subvolume brick
end-volume
# Machine 2
volume cbrick4
type protocol/client
option transport-type tcp
option remote-host 10.106.105.151
option remote-port 6996
option remote-subvolume brick
end-volume
# Machine 3
volume cbrick7
type protocol/client
option transport-type tcp
option remote-host 10.106.105.152
option remote-port 6996
option remote-subvolume brick
end-volume
# Machine 4
volume cbrick10
type protocol/client
option transport-type tcp
option remote-host 10.106.105.153
option remote-port 6996
option remote-subvolume brick
end-volume
# All the bricks delare finished
# Replicate part
volume rep1
type cluster/replicate
subvolumes cbrick1 cbrick4 cbrick7 cbrick10
end-volume
# CLIENT END CONFIGURATION ENDS
========================================================================================================
Regards,
Zhuo Yin (917)215-8740
Gentoo Linux Fan - int (*(*(*pFile)())[10])();
More information about the Gluster-users
mailing list