[Gluster-devel] AFR: machine crash hangs other mounts or transport endpoint not connected
Christopher Hawkins
chawkins at veracitynetworks.com
Sat Apr 26 01:31:00 UTC 2008
I am having the same issue. I'm working on a diskless
node cluster and figured the issue was related to that
since AFR seems to fail over nicely for everyone else...
But it seems I am not alone, so what can I do to help troubleshoot?
I have two servers exporting a brick each, and a client mounting
them both with AFR and no unify. Transport timeout settings
don't seem to make a difference - client is just hung if I power off
or just stop glusterfsd. There is nothing logged on the server side.
I'll use a usb thumb drive for client side logging since any logs in
the ramdisk obviously disappear after the reboot which fixes the hang...
If I get any insight from this I'll report it asap.
Thanks,
Chris
> Real simple, two bricks on ext3 with user_xattr.
> It is storage for mailstore. The issue that I've been
> battling is that when one of the machines crash, the other
> machine loses the mailstore with either the transport
> endpoint disconnect or the glusterfs filesystem is hung. You
> cannot do anything with it. 'ls' it, 'df' it, ... nothing.
> If I try to kill glusterfs/d it just gives me /glusterfsmount
> busy. The only recovery at this point is to reboot the good
> machine as well as the failed machine. So needing to do that
> is sort of defeating my purpose of creating this array. Is
> there no way that glusterfs can recover from the crash such
> that things are still good on the other bricks and mounts on
> other machines?
>
> Thanks,
> Gerry
More information about the Gluster-devel
mailing list