[Gluster-devel] gluster 3.0.0 catastrophic crash during basic file creation test

Tejas N. Bhise tejas at gluster.com
Thu Feb 4 17:35:54 UTC 2010


Hi Daniel,

Thank you for actively using GlusterFS. You seemed to be on 2.0.x a little while back. Did you migrate to 3.0.0 recently ? There was a defect which could sometimes lead to a crash if there was communication between a 2.x client and 3.x server or 3.x client and 2.x server. I just want to confirm that all your clients and servers have been upgraded.

Besides that, if you have recently upgraded to 3.0.0, please consider 3.0.2 which would be out very soon ( your can even try 3.0.2rc1 ). It has much better performance that previous versions too.

Feel free to ping us if you need any assistance while upgrading.

Regards,
Tejas.

----- Original Message -----
From: "Vijay Bellur" <vijay at gluster.com>
To: "Daniel Maher" <dma+gluster at witbe.net>
Cc: "Gluster List" <gluster-devel at nongnu.org>
Sent: Thursday, February 4, 2010 10:12:07 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
Subject: Re: [Gluster-devel] gluster 3.0.0 catastrophic crash during basic file	creation test

Hello Daniel,

Do you notice anything in dmesg when  the server freeze happens?

If you can salvage dmesg, /var/log/messages from the console and 
glusterfsd core, that would help.

Regards,
Vijay


Daniel Maher wrote:
>
> Hello,
>
> I managed to crash Gluster 3.0.0 severely during a simple file 
> creation test.  Not only did the crash result in the standard « 
> transport endpoint not connected » problem, but the servers in 
> question had to be hard-reset in order to make them operational again.
>
> So, here goes...
>
> 4 nodes, two servers, two clients, client-side replication.  Clients 
> are Fedora 8, servers are Fedora 9.  Stock FUSE used throughout. 
> Configurations generated with the volgen tool using the following 
> commandline :
>
> # glusterfs-volgen --name replicated --raid 1 s01:/opt/gluster 
> s02:/opt/gluster
>
> Servers :
> # service glusterfsd start
>
> Clients :
> # mount -t glusterfs /etc/glusterfs/replicated-tcp.vol /opt/gluster/
>
> The following Python script was used to run the file creation test :
> http://nfsv4.bullopensource.org/tools/tests_tools/test_files.py
>
> The Python script was edited only to point the target directory to the 
> Gluster mount.  Each client was told to use a different sub-directory 
> within the Gluster mount point.
>
> This script was used in the context of a bash looping script, which is 
> as follows :
> #!/bin/bash
> LOOP=0
> while [ $LOOP -lt 1000 ]
> do
>     time ./test_files.py | tee -a go_test_files.log
>     cat ./test_files_orw | tee -a go_test_files.log
>     let LOOP=$LOOP+1
> done
>
> « test_files_orw » is the file that test_files.py outputs to.  It is 
> over-written on each run (hence the redirect).
>
> The script made it through 20 or so iterations before Gluster crashed. 
> The servers responded to ping requests, but no new SSH connections 
> could be made.  Existing sessions open via SSH were frozen.  On the 
> local console, keyboard interactions were still possible, but no new 
> actions could be taken.  The servers were hard-reset at this point.
>
> I'll be happy to provide any further information as is deemed 
> necessary - just let me know.
>
>



_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel





More information about the Gluster-devel mailing list