[Gluster-devel] gluster 3.0.0 catastrophic crash during basic file creation test
dma+gluster at witbe.net
Thu Feb 4 15:43:29 UTC 2010
I managed to crash Gluster 3.0.0 severely during a simple file creation
test. Not only did the crash result in the standard « transport
endpoint not connected » problem, but the servers in question had to be
hard-reset in order to make them operational again.
So, here goes...
4 nodes, two servers, two clients, client-side replication. Clients are
Fedora 8, servers are Fedora 9. Stock FUSE used throughout.
Configurations generated with the volgen tool using the following
# glusterfs-volgen --name replicated --raid 1 s01:/opt/gluster
# service glusterfsd start
# mount -t glusterfs /etc/glusterfs/replicated-tcp.vol /opt/gluster/
The following Python script was used to run the file creation test :
The Python script was edited only to point the target directory to the
Gluster mount. Each client was told to use a different sub-directory
within the Gluster mount point.
This script was used in the context of a bash looping script, which is
as follows :
while [ $LOOP -lt 1000 ]
time ./test_files.py | tee -a go_test_files.log
cat ./test_files_orw | tee -a go_test_files.log
« test_files_orw » is the file that test_files.py outputs to. It is
over-written on each run (hence the redirect).
The script made it through 20 or so iterations before Gluster crashed.
The servers responded to ping requests, but no new SSH connections could
be made. Existing sessions open via SSH were frozen. On the local
console, keyboard interactions were still possible, but no new actions
could be taken. The servers were hard-reset at this point.
I'll be happy to provide any further information as is deemed necessary
- just let me know.
Daniel Maher <dma+gluster AT witbe DOT net>
More information about the Gluster-devel