[Gluster-devel] AFR: machine crash hangs other mounts or transport endpoint not connected

Krishna Srinivas krishna at zresearch.com
Tue Apr 29 08:02:30 UTC 2008


Gerry, Christopher,

Here is what I tried to do. Two servers, one client, simple setup, afr on the
client side. I did "ls" on client mount point, it works, now I do
"ifconfig eth0 down"
on the server, next I do "ls" on client, it hangs for 10 secs (timeout value)
and fails over and starts working again without any problem.

I guess few users are facing the problem you guys are facing. Can you give us
your setup details and mention the exact steps to reproduce. Also try
to come up with minimal config details which can still reproduce the
problem

Thanks!
Krishna

On Sat, Apr 26, 2008 at 7:01 AM, Christopher Hawkins
<chawkins at veracitynetworks.com> wrote:
> I am having the same issue. I'm working on a diskless
>  node cluster and figured the issue was related to that
>  since AFR seems to fail over nicely for everyone else...
>  But it seems I am not alone, so what can I do to help troubleshoot?
>
>  I have two servers exporting a brick each, and a client mounting
>  them both with AFR and no unify. Transport timeout settings
>  don't seem to make a difference - client is just hung if I power off
>  or just stop glusterfsd. There is nothing logged on the server side.
>  I'll use a usb thumb drive for client side logging since any logs in
>  the ramdisk obviously disappear after the reboot which fixes the hang...
>  If I get any insight from this I'll report it asap.
>
>  Thanks,
>  Chris
>
>
>
>  > Real simple, two bricks on ext3 with user_xattr.
>  > It is storage for mailstore.  The issue that I've been
>  > battling is that when one of the machines crash, the other
>  > machine loses the mailstore with either the transport
>  > endpoint disconnect or the glusterfs filesystem is hung.  You
>  > cannot do anything with it. 'ls' it, 'df' it, ... nothing.
>  > If I try to kill glusterfs/d it just gives me /glusterfsmount
>  > busy.  The only recovery at this point is to reboot the good
>  > machine as well as the failed machine.  So needing to do that
>  > is sort of defeating my purpose of creating this array.  Is
>  > there no way that glusterfs can recover from the crash such
>  > that things are still good on the other bricks and mounts on
>  > other machines?
>  >
>  > Thanks,
>  > Gerry
>
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  Gluster-devel at nongnu.org
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>





More information about the Gluster-devel mailing list