[Gluster-devel] AFR: machine crash hangs other mountsortransportendpoint not connected
Krishna Srinivas
krishna at zresearch.com
Thu May 8 10:35:23 UTC 2008
Chris,
Do you see clues in the log files?
Krishna
On Wed, Apr 30, 2008 at 8:22 PM, Anand Avati <avati at zresearch.com> wrote:
> Chris,
> can you get the glusterfs client logs from your ramdisk when the servers
> are being pulled out and tried to access the mount point?
>
>
>
> avati
>
> 2008/4/30 Christopher Hawkins <chawkins at veracitynetworks.com>:
>
> > Without. All that is removed...
> >
> >
> > _____
> >
> > From: anand.avati at gmail.com [mailto:anand.avati at gmail.com] On Behalf Of
> > Anand Avati
> > Sent: Wednesday, April 30, 2008 10:24 AM
> > To: Christopher Hawkins
> > Cc: gluster-devel at nongnu.org
> > Subject: Re: [Gluster-devel] AFR: machine crash hangs other
> > mountsortransportendpoint not connected
> >
> >
> > Chris,
> > is this hang with IP failover in place or without?
> >
> > avati
> >
> >
> > 2008/4/30 Christopher Hawkins <chawkins at veracitynetworks.com>:
> >
> >
> >
> > Gluster devs,
> >
> > I am still not able to keep the client from hanging in a diskless cluster
> > node. When I fail a server the client becomes unresponsive and does not
> > read
> > from the other AFR volume. I first moved the entire /lib and /bin and
> > /sbin
> > directories into the ramdisk which runs the nodes to rule out the simple
> > loss of an odd binary or library... An lsof | grep gluster on the client
> > (pre-failover test) shows:
> >
> > [root at node1 ~]# lsof |grep gluster
> > glusterfs 2195 root cwd DIR 0,1 0 2 /
> > glusterfs 2195 root rtd DIR 0,1 0 2 /
> > glusterfs 2195 root txt REG 0,1 55592 3863
> > /bin/glusterfs
> > glusterfs 2195 root mem REG 0,1 341068 2392
> > /lib/libfuse.so.2.7.2
> > glusterfs 2195 root mem REG 0,1 118096 2505
> > /lib/glusterfs/1.3.8pre6/xlator/mount/fuse.so
> > glusterfs 2195 root mem REG 0,1 164703 2514
> > /lib/glusterfs/1.3.8pre6/xlator/protocol/client.so
> > glusterfs 2195 root mem REG 0,1 112168 77
> > /lib/ld-2.3.4.so
> > glusterfs 2195 root mem REG 0,1 1529120 2483
> > /lib/tls/libc-2.3.4.so
> > glusterfs 2195 root mem REG 0,1 16732 70
> > /lib/libdl-2.3.4.so
> > glusterfs 2195 root mem REG 0,1 107800 2485
> > /lib/tls/libpthread-2.3.4.so
> > glusterfs 2195 root mem REG 0,1 43645 2533
> > /lib/glusterfs/1.3.8pre6/transport/tcp/client.so
> > glusterfs 2195 root mem REG 0,1 427763 2456
> > /lib/libglusterfs.so.0.0.0
> > glusterfs 2195 root mem REG 0,1 50672 2474
> > /lib/tls/librt-2.3.4.so
> > glusterfs 2195 root mem REG 0,1 245686 2522
> > /lib/glusterfs/1.3.8pre6/xlator/cluster/afr.so
> > glusterfs 2195 root 0u CHR 1,3 3393
> > /dev/null
> > glusterfs 2195 root 1u CHR 1,3 3393
> > /dev/null
> > glusterfs 2195 root 2u CHR 1,3 3393
> > /dev/null
> > glusterfs 2195 root 3w REG 0,1 102 4495
> > /var/log/glusterfs/glusterfs.log
> > glusterfs 2195 root 4u CHR 10,229 3494
> > /dev/fuse
> > glusterfs 2195 root 5r 0000 0,8 0 4498
> > eventpoll
> > glusterfs 2195 root 6u IPv4 4499 TCP
> > 192.168.20.155:1023->master1:6996 (ESTABLISHED)
> > glusterfs 2195 root 7u IPv4 4500 TCP
> > 192.168.20.155:1022->master2:6996 (ESTABLISHED)
> >
> > Everything listed here is a local file and the gluster binary has access
> > to
> > them during failover. Can you help me troubleshoot by explaining what
> > exactly gluster is doing when it loses a connection? Does it depend on
> > something I have missed? This failover test uses the same config files and
> > binaries that my earlier tests use (which succeeded, but were not run on a
> > diskless node). There must be something else in the filesystem that
> > glusterfs requires to failover successfully?
> >
> > Thanks,
> > Chris
> >
> >
> > > > >
> > > > >
> > > > > > Gerry, Christopher,
> > > > > >
> > > > > > Here is what I tried to do. Two servers, one client, simple
> > > > > setup, afr
> > > > > > on the client side. I did "ls" on client mount point, it
> > > > > works, now I
> > > > > > do "ifconfig eth0 down"
> > > > > > on the server, next I do "ls" on client, it hangs for 10
> > > > > secs (timeout
> > > > > > value) and fails over and starts working again without
> > > > any problem.
> > > > > >
> > > > > > I guess few users are facing the problem you guys are facing.
> > > > > > Can you give us your setup details and mention the
> > > exact steps to
> > > > > > reproduce. Also try to come up with minimal config details
> > > > > which can
> > > > > > still reproduce the problem
> > > > > >
> > > > > > Thanks!
> > > > > > Krishna
> > > > > >
> > > > > > On Sat, Apr 26, 2008 at 7:01 AM, Christopher Hawkins
> > > > > > <chawkins at veracitynetworks.com> wrote:
> > > > > > > I am having the same issue. I'm working on a diskless
> > > > > node cluster
> > > > > > > and figured the issue was related to that since AFR
> > > > > seems to fail
> > > > > > > over nicely for everyone else...
> > > > > > > But it seems I am not alone, so what can I do to help
> > > > > troubleshoot?
> > > > > > >
> > > > > > > I have two servers exporting a brick each, and a
> > > > client mounting
> > > > > > > them both with AFR and no unify. Transport timeout
> > > > > settings don't
> > > > > > > seem to make a difference - client is just hung if I
> > > > > power off or
> > > > > > > just stop glusterfsd. There is nothing logged on the
> > > > server side.
> > > > > > > I'll use a usb thumb drive for client side logging since
> > > > > > any logs in
> > > > > > > the ramdisk obviously disappear after the reboot which
> > > > > > fixes the hang...
> > > > > > > If I get any insight from this I'll report it asap.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Chris
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at nongnu.org
> > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at nongnu.org
> > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> >
> >
> >
> >
> > --
> > If I traveled to the end of the rainbow
> > As Dame Fortune did intend,
> > Murphy would be there to tell me
> > The pot's at the other end.
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
>
> --
> If I traveled to the end of the rainbow
> As Dame Fortune did intend,
> Murphy would be there to tell me
> The pot's at the other end.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list