[Gluster-devel] AFR: machine crash hangs other mountsortransportendpoint not connected
Christopher Hawkins
chawkins at veracitynetworks.com
Thu May 8 11:03:51 UTC 2008
When I set up a minimal ramdisk environment with no fancy directory
re-mapping going on, the failover works. I had to work on other things for a
few days and have not resolved it 100% yet, but it appears that my setup is
creating the issue, not glusterfs. But thank you for the follow up! If I
suspect a gluster issue I will reopen the thread.
Chris
> -----Original Message-----
> From: krishna.zresearch at gmail.com
> [mailto:krishna.zresearch at gmail.com] On Behalf Of Krishna Srinivas
> Sent: Thursday, May 08, 2008 6:35 AM
> To: Anand Avati
> Cc: Christopher Hawkins; gluster-devel at nongnu.org
> Subject: Re: [Gluster-devel] AFR: machine crash hangs other
> mountsortransportendpoint not connected
>
> Chris,
> Do you see clues in the log files?
> Krishna
>
> On Wed, Apr 30, 2008 at 8:22 PM, Anand Avati
> <avati at zresearch.com> wrote:
> > Chris,
> > can you get the glusterfs client logs from your ramdisk when the
> > servers are being pulled out and tried to access the mount point?
> >
> >
> >
> > avati
> >
> > 2008/4/30 Christopher Hawkins <chawkins at veracitynetworks.com>:
> >
> > > Without. All that is removed...
> > >
> > >
> > > _____
> > >
> > > From: anand.avati at gmail.com [mailto:anand.avati at gmail.com] On
> > Behalf Of > Anand Avati > Sent: Wednesday, April 30, 2008
> 10:24 AM
> > > To: Christopher Hawkins > Cc: gluster-devel at nongnu.org
> > Subject:
> > Re: [Gluster-devel] AFR: machine crash hangs other >
> > mountsortransportendpoint not connected > > > Chris, > is this
> > hang with IP failover in place or without?
> > >
> > > avati
> > >
> > >
> > > 2008/4/30 Christopher Hawkins <chawkins at veracitynetworks.com>:
> > >
> > >
> > >
> > > Gluster devs,
> > >
> > > I am still not able to keep the client from hanging in a
> diskless
> > cluster > node. When I fail a server the client becomes
> unresponsive
> > and does not > read > from the other AFR volume. I first
> moved the
> > entire /lib and /bin and > /sbin > directories into the ramdisk
> > which runs the nodes to rule out the simple > loss of an
> odd binary
> > or library... An lsof | grep gluster on the client > (pre-failover
> > test) shows:
> > >
> > > [root at node1 ~]# lsof |grep gluster
> > > glusterfs 2195 root cwd DIR 0,1 0
> 2 /
> > > glusterfs 2195 root rtd DIR 0,1 0
> 2 /
> > > glusterfs 2195 root txt REG 0,1 55592
> 3863
> > > /bin/glusterfs
> > > glusterfs 2195 root mem REG 0,1 341068
> 2392
> > > /lib/libfuse.so.2.7.2
> > > glusterfs 2195 root mem REG 0,1 118096
> 2505
> > > /lib/glusterfs/1.3.8pre6/xlator/mount/fuse.so
> > > glusterfs 2195 root mem REG 0,1 164703
> 2514
> > > /lib/glusterfs/1.3.8pre6/xlator/protocol/client.so
> > > glusterfs 2195 root mem REG 0,1 112168
> 77
> > > /lib/ld-2.3.4.so
> > > glusterfs 2195 root mem REG 0,1 1529120
> 2483
> > > /lib/tls/libc-2.3.4.so
> > > glusterfs 2195 root mem REG 0,1 16732
> 70
> > > /lib/libdl-2.3.4.so
> > > glusterfs 2195 root mem REG 0,1 107800
> 2485
> > > /lib/tls/libpthread-2.3.4.so
> > > glusterfs 2195 root mem REG 0,1 43645
> 2533
> > > /lib/glusterfs/1.3.8pre6/transport/tcp/client.so
> > > glusterfs 2195 root mem REG 0,1 427763
> 2456
> > > /lib/libglusterfs.so.0.0.0
> > > glusterfs 2195 root mem REG 0,1 50672
> 2474
> > > /lib/tls/librt-2.3.4.so
> > > glusterfs 2195 root mem REG 0,1 245686
> 2522
> > > /lib/glusterfs/1.3.8pre6/xlator/cluster/afr.so
> > > glusterfs 2195 root 0u CHR 1,3
> 3393
> > > /dev/null
> > > glusterfs 2195 root 1u CHR 1,3
> 3393
> > > /dev/null
> > > glusterfs 2195 root 2u CHR 1,3
> 3393
> > > /dev/null
> > > glusterfs 2195 root 3w REG 0,1 102
> 4495
> > > /var/log/glusterfs/glusterfs.log
> > > glusterfs 2195 root 4u CHR 10,229
> 3494
> > > /dev/fuse
> > > glusterfs 2195 root 5r 0000 0,8 0
> 4498
> > > eventpoll
> > > glusterfs 2195 root 6u IPv4 4499
> TCP
> > > 192.168.20.155:1023->master1:6996 (ESTABLISHED)
> > > glusterfs 2195 root 7u IPv4 4500
> TCP
> > > 192.168.20.155:1022->master2:6996 (ESTABLISHED) > > Everything
> > listed here is a local file and the gluster binary has
> access > to >
> > them during failover. Can you help me troubleshoot by
> explaining what
> > > exactly gluster is doing when it loses a connection? Does
> it depend
> > on > something I have missed? This failover test uses the
> same config
> > files and > binaries that my earlier tests use (which
> succeeded, but
> > were not run on a > diskless node). There must be
> something else in
> > the filesystem that > glusterfs requires to failover successfully?
> > >
> > > Thanks,
> > > Chris
> > >
> > >
> > > > > >
> > > > > >
> > > > > > > Gerry, Christopher,
> > > > > > >
> > > > > > > Here is what I tried to do. Two servers, one
> client, simple
> > > > > > setup, afr > > > > > on the client side. I did
> "ls" on client
> > mount point, it > > > > works, now I > > > > > do "ifconfig eth0
> > down"
> > > > > > > on the server, next I do "ls" on client, it
> hangs for 10 >
> > > > > secs (timeout > > > > > value) and fails over and starts
> > working again without > > > any problem.
> > > > > > >
> > > > > > > I guess few users are facing the problem you
> guys are facing.
> > > > > > > Can you give us your setup details and mention the > >
> > exact steps to > > > > > reproduce. Also try to come up
> with minimal
> > config details > > > > which can > > > > > still reproduce the
> > problem > > > > > > > > > > Thanks!
> > > > > > > Krishna
> > > > > > >
> > > > > > > On Sat, Apr 26, 2008 at 7:01 AM, Christopher
> Hawkins > > >
> > > > <chawkins at veracitynetworks.com> wrote:
> > > > > > > > I am having the same issue. I'm working on a
> diskless >
> > > > > node cluster > > > > > > and figured the issue was
> related to
> > that since AFR > > > > seems to fail > > > > > > over nicely for
> > everyone else...
> > > > > > > > But it seems I am not alone, so what can I do
> to help >
> > > > > troubleshoot?
> > > > > > > >
> > > > > > > > I have two servers exporting a brick each,
> and a > > >
> > client mounting > > > > > > them both with AFR and no unify.
> > Transport timeout > > > > settings don't > > > > > >
> seem to make a
> > difference - client is just hung if I > > > > power off
> or > > > >
> > > > just stop glusterfsd. There is nothing logged on the >
> > > server
> > side.
> > > > > > > > I'll use a usb thumb drive for client side
> logging since
> > > > > > > any logs in > > > > > > the ramdisk obviously disappear
> > after the reboot which > > > > > fixes the hang...
> > > > > > > > If I get any insight from this I'll report it asap.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Chris
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Gluster-devel mailing list
> > > > > > Gluster-devel at nongnu.org
> > > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at nongnu.org
> > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at nongnu.org
> > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> > >
> > >
> > >
> > >
> > > --
> > > If I traveled to the end of the rainbow > As Dame Fortune did
> > intend, > Murphy would be there to tell me > The pot's at
> the other
> > end.
> > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> >
> > --
> > If I traveled to the end of the rainbow As Dame Fortune
> did intend,
> > Murphy would be there to tell me The pot's at the other end.
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
More information about the Gluster-devel
mailing list