[Gluster-devel] client reconnect
Brent A Nelson
brent at phys.ufl.edu
Fri May 25 22:21:45 UTC 2007
I mentioned in a previous email that client reconnection may not be 100%.
I encountered this again in the following scenario: one of my servers (in
a multiserver unify/afr) was trying to format a bad drive, and this
knocked out access to all my 3ware disks which were being exported by
GlusterFS from that machine. While in this condition, a couple of clients
tried to ls directories on a filesystem that uses this server (and its
mirror). I suspect they were able to contact the glusterfsd of the "bad"
machine, but glusterfsd deadlocked trying to access the disk. I ended up
rebooting the server, but the clients that were trying to ls never
returned and had to be killed. The mountpoints had to be unmounted and
the filesystem remounted.
It seems to me (you will probably come up with something much better)
that if the client successfully communicates a request to a server but the
server doesn't complete the request, the client needs to timeout the I/O
request that it was waiting on and try again. In the case of afr, it
should also check to see if the mirror host can satisfy the request,
instead.
Thanks,
Brent
More information about the Gluster-devel
mailing list