[Gluster-devel] Crash on HA config when restoring a server
Kevan Benson
kbenson at a-1networks.com
Wed Aug 8 18:02:27 UTC 2007
When running a HA config with 2 servers and 2 clients, I can consistently
crash the active server after failing the other. This is on TLA version
patched to 440.
System configs at http://glusterfs.pastebin.com/m52564c56
Server A: 172.16.1.81
Server B: 172.16.1.82
Client A: 172.16.1.85
Client B: 172.16.1.86
Note: Client transport-timeout (on clients and servers) was set to 10 in first
two crashes, and set to 30 on Client A and B in the last one (servers still
had it set to 10).
For the first crash, I fail server B (ifdown eth1), and then try to ls the
mount point with the client (time ls -l /mnt/glusterfs) from both clients. I
generally get a "ls: /mnt/glusterfs/: Transport endpoint is not connected"
error once or twice, and then the active server's (A) glusterfsd will either
start responding or crash (about 50% chance). In this case, I had restored
network connectivity to server B and ran a few more ls's from the clients.
The glusterfsd.log (including backtrace) is at
http://glusterfs.pastebin.com/m15d7f914
Upon restarting glusterfs on server A and restoring the network connection to
server B, I initiated the above ls from the clients and crashed server A's
glusterfsd again. Glusterfsd on Server B was never restarted, it was failed
because of lack of connectivity.
The glusterfsd.log (including backtrace) for THIS crash is at
http://glusterfs.pastebin.com/m28ee8e5a
Here's a crash from doing an ls with one server failed, after restarting one
of the servers a few times.
The glusterfsd.log (including backtrace):
http://glusterfs.pastebin.com/m2ee6c471
All logs shown are from the crashing server, Server A. I can just as easily
crash server B by failing A. Let me know if you need more logs from other
hosts and I'll re-run whichever scenarios you like,
--
- Kevan Benson
- A-1 Networks
More information about the Gluster-devel
mailing list