[Gluster-devel] pre6 hanging problems

August R. Wohlt glusterfs at isidore.net
Tue Jul 31 21:41:55 UTC 2007


Hi, memory (3g/4g used) and cpu are normal (load of ~1.5) when this happens
and when i run the server in gdb, it is not captured. I have to suspend it
to get a backtrace. Similarly, when run outside of gdb, it doesn't crash.
the server just ends up not responding to either of the clients.

It's too bad it's not an obviously traceable segv, but it is very consistent
for me.  I did a tcpdump on the traffic between the two when it was in this
"slow happening" state and nothing is going across the wire. I have a
feeling the tcp reconnect code between the two is getting confused thinking
something is still open when really it's not any more .

I have resorted to restarting the gluserfsd server via cron every 6 hours
and that keeps it alive long enough for it not to crash (for now). I'll
start sticking debugging statements in I guess to see what's getting stuck
where in the next few days when I get a chance, but the code is all greek to
me at this point so it might be a while.

thanks.
g

On 7/31/07, Anand Avati <avati at zresearch.com> wrote:
>
> August,
> the backtraces indicate that things were actually "happening" (but most
> likely very slowly). In the event of a freeze, the protocol/client has
> timeout intelligence and bails out the connection out of the hanging. Do you
> have observations about the memory usage/cpu usage when glusterfsd is in
> this state? That should throw more light. The client log's connection timed
> out seems fishy though.. was it that glusterfsd was 'captured' in gdb while
> the client was trying to reconnect? (this can produce such a log) apart from
> that the client logs seem normal.
>
> avati
>
> 2007/7/30, August R. Wohlt <glusterfs at isidore.net>:
> >
> > Hi -
> >
> > So still the same problem with pre6 hanging regularly with or without
> > gdb:
> >
> > To summarize:
> >
> > After starting 2 physically distinct clients (separate clients on gigE
> > switch with server) on centos 5 with pre6 against a server on centos 5
> > pre
> > 6, doing a heavy rsync to these mounts will regularly freeze up after a
> > few
> > hours. The clients only have writebehind enabled, and the server has no
> > other performance enhancements in it.
> >
> > When the clients' mounts hang, nothing at all shows up in the server log
> >
> > with debug turned on. The log simple goes along as usual then stops:
> >
> > ...
> > 2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode:
> > destroy
> > inode(0)
> > 2007-07-30 05:18:36 D [inode.c:534:__create_inode] brick_1/inode: create
> >
> > inode(8979319)
> > 2007-07-30 05:18:36 D [inode.c:327:__active_inode] brick_1/inode:
> > activating
> > inode(8979319), lru=1024/1024
> > 2007-07-30 05:18:36 D [inode.c:357:__passive_inode] brick_1/inode:
> > passivating inode(8979319), lru=1025/1024
> > 2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode:
> > destroy
> > inode(5031361)
> > 2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode:
> > destroy
> > inode(0)
> >
> >
> > Both of the clients have disconnected it seems, because their logs show:
> >
> >
> > ...
> > 2007-07-29 21:45:18 D [inode.c:285:__destroy_inode] fuse/inode: destroy
> > inode(46470909)
> > 2007-07-29 21:45:18 D [inode.c:327:__active_inode] fuse/inode:
> > activating
> > inode(46470877), lru=236831/1024
> > 2007-07-29 21:45:18 D [inode.c:285:__destroy_inode] fuse/inode: destroy
> > inode(46470877)
> > 2007-07-29 21:45:18 E [fuse-bridge.c:2017:fuse_transport_notify]
> > glusterfs-fuse: got GF_EVENT_POLLERR
> > 2007-07-29 21:45:18 D [ fuse-bridge.c:1857:fuse_transport_disconnect]
> > glusterfs-fuse: cleaning up fuse transport in disconnect handler
> >
> > ...
> > 2007-07-29 21:45:13 D [inode.c:285:__destroy_inode] fuse/inode: destroy
> > inode(13058343)
> > 2007-07-29 21:45:13 D [inode.c:327:__active_inode] fuse/inode:
> > activating
> > inode(13058335), lru=330169/1024
> > 2007-07-29 21:45:13 D [inode.c:285:__destroy_inode] fuse/inode: destroy
> > inode(13058335)
> > 2007-07-29 21:45:13 E [ fuse-bridge.c:2017:fuse_transport_notify]
> > glusterfs-fuse: got GF_EVENT_POLLERR
> > 2007-07-29 21:45:13 D [fuse-bridge.c:1857:fuse_transport_disconnect]
> > glusterfs-fuse: cleaning up fuse transport in disconnect handler
> >
> > Both of them generate SIGCONT in gdb. But they have different behavior
> > when
> > I continue:
> >
> > The first one just loops with SIGCONT, the other a SIGPIPE:
> >
> > client1:
> >
> > (gdb) run
> > Starting program: /usr/sbin/glusterfs -N -LDEBUG
> > -l/var/log/glusterfs/glusterfs.log -f/etc/glusterfs/glusterfs-client.vol
> > /backups
> > [Thread debugging using libthread_db enabled]
> > [New Thread -1209063744 (LWP 12708)]
> > Detaching after fork from child process 12710.
> > [New Thread 16427936 (LWP 12711)]
> >
> > Program received signal SIGCONT, Continued.
> > [Switching to Thread 16427936 (LWP 12711)]
> > 0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> > (gdb) c
> > Continuing.
> >
> > Program received signal SIGCONT, Continued.
> > 0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> > (gdb) bt
> > #0  0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> > #1  0x005a3824 in raise () from /lib/tls/libpthread.so.0
> > #2  0x0013bb0c in tcp_bail (this=0x947b3d0) at
> > ../../../../transport/tcp/tcp.c:146
> > #3  0x00122bbc in transport_bail (this=0x947b3d0) at transport.c:192
> > #4  0x0012ea16 in call_bail (trans=0x947b3d0) at client-protocol.c :220
> > #5  0x00123870 in gf_timer_proc (ctx=0xbfe97140) at timer.c:119
> > #6  0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
> > #7  0x00414c3e in clone () from /lib/tls/libc.so.6
> > (gdb)
> >
> >
> > client2:
> >
> > Starting program: /usr/sbin/glusterfs -f/usr/etc/glusterfs/glusterfs-
> > client.vol -N -LDEBUG -l/var/log/glusterfs/glusterfs.log /backups
> > [Thread debugging using libthread_db enabled]
> > [New Thread -1208265024 (LWP 7022)]
> > Detaching after fork from child process 7023.
> > [New Thread 26753968 (LWP 7024)]
> >
> > Program received signal SIGCONT, Continued.
> > [Switching to Thread 26753968 (LWP 7024)]
> > 0x007167a2 in _dl_sysinfo_int80 () from /lib/ld- linux.so.2
> > (gdb) c
> > Continuing.
> >
> > Program received signal SIGPIPE, Broken pipe.
> > [Switching to Thread -1208265024 (LWP 7022)]
> > 0x007167a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> > (gdb) bt
> > #0  0x007167a2 in _dl_sysinfo_int80 () from /lib/ld- linux.so.2
> > #1  0x007ef544 in writev () from /lib/tls/libc.so.6
> > #2  0x00f799a1 in full_rwv (fd=7, vector=0x20157, count=39, fn=0x7ef490
> > <writev>) at common-utils.c:100
> > #3  0x00962386 in tcp_client_writev (this=0x8ecf418, vector=0xbffe0fd0,
> > count=55) at tcp-client.c:251
> > #4  0x002a7b3b in client_protocol_xfer (frame=0xdd003b0, this=0x8e935c8,
> > type=GF_OP_TYPE_FOP_REQUEST, op=GF_FOP_WRITE,
> >     request=0xdea0028) at client-protocol.c:324
> > #5  0x002a8ec5 in client_writev (frame=0xdd003b0, this=0x8e935c8,
> > fd=0xc379bc8, vector=0xa8b05d0, count=32, offset=131072)
> >     at client-protocol.c:1126
> > #6  0x00efacf0 in wb_sync (frame=0xa8b057c, file=0xdb29698) at
> > write-behind.c:180
> > #7  0x00efb7eb in wb_writev (frame=0xdd003b0, this=0x8e97f80,
> > fd=0xc379bc8,
> > vector=0xbffe13a0, count=1, offset=258048)
> >     at write-behind.c:603
> > #8  0x0804d4e8 in fuse_write (req=0xc04dcd8, ino=53232572,
> >     buf=0xb7b97048 "d6450 at mypacs.net>; Sun, 29 Jul 2007 08:34:58
> > -0700\nFrom: \"Ebay Leader\" <DemetriusVenable at nosopoetic.com>\nTo:
> > fubbed6450 at mypacs.net\nSubject: Ebay for dummies for Fubbed6450\nDate:
> > Sun,
> > 29 Jul 2007 07:28"..., size=4096,
> >     off=258048, fi=0xffffffe0) at fuse-bridge.c:1235
> > #9  0x00320c1a in do_write (req=0xc04dcd8, nodeid=53232572,
> > inarg=0xb7b97030) at fuse_lowlevel.c:635
> > #10 0x00321bd0 in fuse_ll_process (data=0x8e93718, buf=0xb7b97008
> > "@\020",
> > len=4160, ch=0x8e936a8) at fuse_lowlevel.c:1171
> > #11 0x00322ea3 in fuse_session_process (se=0xffffffe0, buf=0xb7b97008
> > "@\020", len=4160, ch=0x8e936a8) at fuse_session.c:89
> > #12 0x0804efb9 in fuse_transport_notify (xl=0x8e98020, event=2,
> > data=0x8e933d0) at fuse-bridge.c:2050
> > #13 0x00f7af6c in transport_notify (this=0x8e933d0, event=1) at
> > transport.c
> > :154
> > #14 0x00f7b581 in sys_epoll_iteration (ctx=0xbffe1570) at epoll.c:54
> > #15 0x00f7b10b in poll_iteration (ctx=0xbffe1570) at transport.c:262
> > #16 0x0804a5e3 in main (argc=6, argv=0xbffe1644) at glusterfs.c:382
> > (gdb) c
> > Continuing.
> >
> > All commands running on the mountpoint are frozen when this happens, and
> > when I run glusterfs without gdb, it just freezes up, and does not die.
> >
> > If I suspend the server and do a backtrace on it, it shows:
> >
> > (gdb) run
> > Starting program: /usr/sbin/glusterfsd -N -LDEBUG
> > -f/etc/glusterfs/glusterfs-server.vol -l/var/log/glusterfs/glusterfs.log
> > [Thread debugging using libthread_db enabled]
> > [New Thread 46912496205696 (LWP 22070)]
> > glusterfsd: WARNING: ignoring stale pidfile for PID 22044
> > [New Thread 1084229952 (LWP 22071)]
> > [New Thread 1094719808 (LWP 22072)]
> > [New Thread 1105209664 (LWP 22073)]
> >
> > Program received signal SIGTSTP, Stopped (user).
> > [Switching to Thread 46912496205696 (LWP 22070)]
> > 0x000000358400cb5b in __read_nocancel () from /lib64/libpthread.so.0
> > (gdb) bt
> > #0  0x000000358400cb5b in __read_nocancel () from /lib64/libpthread.so.0
> >
> > #1  0x00000036a800ed08 in full_rw (fd=8, buf=0x2aaaac155c90
> > "00000007\n0000000b:00000006\nCALLER_PID", size=131287,
> >     op=0x358400cb10 <read>) at common-utils.c:51
> > #2  0x00000036a800ed7a in gf_full_read (fd=8, buf=0x2aaaac155c90
> > "00000007\n0000000b:00000006\nCALLER_PID", size=131287)
> >     at common-utils.c:71
> > #3  0x00002aaaaaedce8f in tcp_recieve (this=0xd9f13c0,
> > buf=0x2aaaac155c90
> > "00000007\n0000000b:00000006\nCALLER_PID",
> >     len=131287) at ../../../../transport/tcp/tcp.c:49
> > #4  0x00000036a80104a8 in gf_block_unserialize_transport
> > (trans=0xd9f13c0)
> > at protocol.c:340
> > #5  0x00002aaaaacd842c in notify (this=0xd9f7e10, event=2,
> > data=0xd9f13c0)
> > at server-protocol.c:5819
> > #6  0x00000036a8010df2 in transport_notify (this=0xd9f13c0, event=1) at
> > transport.c:154
> > #7  0x00000036a80114e9 in epoll_notify (eevent=1, data=0xd9f13c0) at
> > epoll.c
> > :54
> > #8  0x00000036a80117a7 in sys_epoll_iteration (ctx=0x7fff2458de70) at
> > epoll.c:146
> > #9  0x00000036a8010fc6 in poll_iteration (ctx=0x7fff2458de70) at
> > transport.c
> > :262
> > #10 0x0000000000401a58 in main (argc=5, argv=0x7fff2458dfc8) at
> > glusterfsd.c
> > :309
> > (gdb)
> >
> >
> > When I kill the server and restart it, the client will show Transport
> > not
> > connected the first time accessing the mount (ie, with "ls"), but the
> > 2nd
> > time it works and shows the directories therein.
> >
> > This happens with great regularity. Sometimes the message in the client
> > logs
> > is "connection on x is still in progress - try again later" but
> > sometimes
> > it's this disconnected message. Every time, the cure for me is the
> > same--
> > restart the server and the clients become happy and continue on their
> > merry
> > way again.
> >
> > I don't have any problem when I remove writebehind from the clients so
> > that
> > it's just plain vanilla client -> server, but my throughput drops from
> > 40MB/s to 150Kbps which is not useable for me and so maybe I just
> > haven't
> > let it go long enough. I have to use pre6 because I'm doing hardlinks on
> > the
> > mount, so I can't drop to a lower version for stability.
> >
> > Because this happens with great regularity (~ every 6hrs), I'm more than
> >
> > happy to do any tracing/debugging that anyone can suggest. Otherwise,
> > I'll
> > try to start some debugging in the code when I have a few spare cycles
> > (few
> > and far between!)
> >
> > Again, this is on a stock kernel on CentOS 5 installation, pre6
> > glusterfs
> > client and server, and fuse 2.7.0-glfs1.
> >
> > client configuration:
> >
> > ### Add client feature and attach to remote subvolume
> > volume brick
> >    type protocol/client
> >    option transport-type tcp/client     # for TCP/IP transport
> >    option remote-host 192.168.2.5       # IP address of the remote brick
> >    option remote-subvolume brick_1  # name of the remote volume
> > end-volume
> >
> > # #### Add writeback feature
> > volume brick-wb
> >    type performance/write-behind
> >    option aggregate-size 131072 # unit in bytes
> >    subvolumes brick
> > end-volume
> >
> > server configuration:
> >
> > ### Export volume "brick" with the contents of "/home/export" directory.
> >
> > volume brick_1
> >    type storage/posix
> >    option directory /home/vg_3ware1/vivalog/brick_1
> > end-volume
> > volume brick_2
> >    type storage/posix
> >    option directory /home/vg_3ware1/vivalog/brick_2
> > end-volume
> >
> > ### Add network serving capability to above brick.
> > volume server
> >    type protocol/server
> >    option transport-type tcp/server     # For TCP/IP transport
> >    option bind-address 192.168.2.5     # Default is to listen on all
> > interfaces
> >    subvolumes brick_1
> >    option auth.ip.brick_1.allow * # Allow access to "brick" volume
> > end-volume
> >
> > ### Add network serving capability to above brick.
> > volume server
> >    type protocol/server
> >    option transport-type tcp/server     # For TCP/IP transport
> >    option bind-address 192.168.2.5     # Default is to listen on all
> > interfaces
> >    subvolumes brick_2
> >    option listen-port 6997
> >    option auth.ip.brick_2.allow * # Allow access to "brick" volume
> > end-volume
> >
> > I tried a single "volume server" entry for both bricks, and I get the
> > same
> > thing.  Also, I have an ssh connection between the two servers open all
> > the
> > time which never drops, so I'm ruling out network problems across the
> > switch.
> >
> > Help!
> > :august
> >
> > On 7/26/07, August R. Wohlt < glusterfs at isidore.net> wrote:
> > >
> > > Hi avati,
> > >
> > > When I run it without gdb, it still has the same behavior. It'll run
> > fine
> > > for a few hours under load and then freeze. When it does, the client
> > spews
> > > these to the logs forever. When I kill glusterfs and remount the
> > directory,
> > > everything's fine again:
> > >
> > > 2007-07-26 12:21:31 D [fuse-bridge.c:344:fuse_entry_cbk]
> > glusterfs-fuse:
> > > ERR => -1 (107)
> > > 2007-07-26 12:21:31 D [inode.c:285:__destroy_inode] fuse/inode:
> > destroy
> > > inode(0)
> > > 2007-07-26 12:23:34 W [client-protocol.c:4158:client_protocol_reconnect]
> > > brick: attempting reconnect
> > > 2007-07-26 12:23:34 D [ tcp-client.c:178:tcp_connect] brick:
> > connection on
> > > 4 still in progress - try later
> > > 2007-07-26 12:29:51 W [client-protocol.c
> > :4158:client_protocol_reconnect]
> > > brick: attempting reconnect
> > > 2007-07-26 12:29:51 E [ tcp-client.c:170:tcp_connect] brick:
> > non-blocking
> > > connect() returned: 110 (Connection timed out)
> > >
> > > :g
> > >
> > > On 7/26/07, Anand Avati <avati at zresearch.com > wrote:
> > > >
> > > > August,
> > > >  It seems to me that you were running the client in GDB, and for
> > some
> > > > reason that particular client bailed out. While bailing out the
> > client
> > > > raises SIGCONT which has been caught by gdb (gdb catches all signals
> > before
> > > > letting the signal handlers take over). the backtrace you have
> > attached is
> > > > NOT a crash, you had to just 'c' (continue) at the gdb. And most
> > likely,
> > > > this is what has given the 'hung' effect as well.
> > > > Is this reproducible for you?
> > > >
> > > > thanks,
> > > > avati
> > > >
> > > > 2007/7/26, August R. Wohlt < glusterfs at isidore.net>:
> > > > >
> > > > > Hi all -
> > > > >
> > > > > I have client and server set up with the pre6 version of gluserfs.
> > > > > Several
> > > > > times a day the client mount will freeze up as does any command
> > that
> > > > > tries
> > > > > to read from the mountpoint. I have to kill the glusterfs process,
> > > > > unmount
> > > > > the directory and remount it to get it to work again.
> > > > >
> > > > > When this happens, there is another glusterfs client on other
> > machines
> > > > > connected to the same server that does not get disconnected. So
> > the
> > > > > timeout
> > > > > message in the logs is confusing to me. If it's really timing out
> > > > > wouldn't
> > > > > the other server be disconnected, too?
> > > > >
> > > > > This is on CentOS 5 with fuse 2.7.0-glfs.
> > > > >
> > > > > When it happens, here's what shows up in the client:
> > > > >
> > > > > ...
> > > > > 2007-07-25 09:45:59 D [inode.c:327:__active_inode] fuse/inode:
> > > > > activating
> > > > > inode(4210807), lru=0/1024
> > > > > 2007-07-25 09:45:59 D [ inode.c:285:__destroy_inode] fuse/inode:
> > > > > destroy
> > > > > inode(4210807)
> > > > > 2007-07-25 12:37:26 W [client-protocol.c:211:call_bail] brick:
> > > > > activating
> > > > > bail-out. pending frames = 1. last sent =
> > > > > 2007-07-25 12:33:42. last received = 2007-07-25 11:42:59
> > > > > transport-timeout =
> > > > > 120
> > > > > 2007-07-25 12:37:26 C [client-protocol.c:219:call_bail] brick:
> > bailing
> > > > > transport
> > > > > 2007-07-25 12:37:26 W [client-protocol.c
> > :4189:client_protocol_cleanup]
> > > > >
> > > > > brick: cleaning up state in transport object
> > > > > 0x80a03d0
> > > > > 2007-07-25 12:37:26 W [client-protocol.c
> > :4238:client_protocol_cleanup]
> > > > > brick: forced unwinding frame type(0) op(15)
> > > > > 2007-07-25 12:37:26 C [tcp.c:81:tcp_disconnect] brick: connection
> > > > > disconnected
> > > > >
> > > > > When it happens, here's what shows up in the server:
> > > > >
> > > > > 2007-07-25 15:37:40 E [protocol.c:346:gf_block_unserialize_transport]
> >
> > > > > libglusterfs/protocol: full_read of block failed: peer (
> > > > > 192.168.2.3:1023)
> > > > > 2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
> >
> > > > > disconnected
> > > > > 2007-07-25 15:37:40 E [ protocol.c
> > :251:gf_block_unserialize_transport]
> > > > > libglusterfs/protocol: EOF from peer ( 192.168.2.4:1023 )
> > > > > 2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
> > > > > disconnected
> > > > >
> > > > > And here's the client backtrace:
> > > > >
> > > > > (gdb) bt
> > > > > #0  0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld- linux.so.2
> > > > > #1  0x005a3824 in raise () from /lib/tls/libpthread.so.0
> > > > > #2  0x00655b0c in tcp_bail (this=0x80a03d0) at
> > > > > ../../../../transport/tcp/tcp.c:146
> > > > > #3  0x00695bbc in transport_bail (this=0x80a03d0) at transport.c:192
> > > > > #4  0x00603a16 in call_bail (trans=0x80a03d0) at client-protocol.c:220
> > > > > #5  0x00696870 in gf_timer_proc (ctx=0xbffeec30) at timer.c:119
> > > > > #6  0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
> > > > > #7  0x00414c3e in clone () from /lib/tls/libc.so.6
> > > > >
> > > > >
> > > > > client config:
> > > > >
> > > > > ### Add client feature and attach to remote subvolume
> > > > > volume brick
> > > > >    type protocol/client
> > > > >    option transport-type tcp/client     # for TCP/IP transport
> > > > >    option remote-host 192.168.2.5       # IP address of the remote
> > > > > brick
> > > > >    option remote-subvolume brick_1  # name of the remote volume
> > > > > end-volume
> > > > >
> > > > > # #### Add writeback feature
> > > > >   volume brick-wb
> > > > >     type performance/write-behind
> > > > >     option aggregate-size 131072 # unit in bytes
> > > > >     subvolumes brick
> > > > >   end-volume
> > > > >
> > > > > server config:
> > > > >
> > > > > ### Export volume "brick" with the contents of "/home/export"
> > > > > directory.
> > > > > volume brick_1
> > > > >    type storage/posix
> > > > >    option directory /home/vg_3ware1/vivalog/brick_1
> > > > > end-volume
> > > > >
> > > > > volume brick_2
> > > > >    type storage/posix
> > > > >    option directory /home/vg_3ware1/vivalog/brick_2
> > > > > end-volume
> > > > >
> > > > > ### Add network serving capability to above brick.
> > > > > volume server
> > > > >    type protocol/server
> > > > >    option transport-type tcp/server     # For TCP/IP transport
> > > > >    option bind-address 192.168.2.5     # Default is to listen on
> > all
> > > > > interfaces
> > > > >    subvolumes brick_1
> > > > >    option auth.ip.brick_2.allow * # Allow access to "brick" volume
> > > > >    option auth.ip.brick_1.allow * # Allow access to "brick" volume
> > > > > end-volume
> > > > >
> > > > > ps I have one server serving two volume bricks to two physically
> > > > > distinct
> > > > > clients.  I assume this is okay--that I don't need to have two
> > > > > separate
> > > > > server declarations.
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel at nongnu.org
> > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Anand V. Avati
> > >
> > >
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
>
> --
> Anand V. Avati



More information about the Gluster-devel mailing list