[Gluster-devel] pre6 hanging problems

Mon Jul 30 10:36:30 UTC 2007

Hi -

So still the same problem with pre6 hanging regularly with or without gdb:

To summarize:

After starting 2 physically distinct clients (separate clients on gigE
switch with server) on centos 5 with pre6 against a server on centos 5 pre
6, doing a heavy rsync to these mounts will regularly freeze up after a few
hours. The clients only have writebehind enabled, and the server has no
other performance enhancements in it.

When the clients' mounts hang, nothing at all shows up in the server log
with debug turned on. The log simple goes along as usual then stops:

...
2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode: destroy
inode(0)
2007-07-30 05:18:36 D [inode.c:534:__create_inode] brick_1/inode: create
inode(8979319)
2007-07-30 05:18:36 D [inode.c:327:__active_inode] brick_1/inode: activating
inode(8979319), lru=1024/1024
2007-07-30 05:18:36 D [inode.c:357:__passive_inode] brick_1/inode:
passivating inode(8979319), lru=1025/1024
2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode: destroy
inode(5031361)
2007-07-30 05:18:36 D [inode.c:285:__destroy_inode] brick_1/inode: destroy
inode(0)

Both of the clients have disconnected it seems, because their logs show:

...
2007-07-29 21:45:18 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(46470909)
2007-07-29 21:45:18 D [inode.c:327:__active_inode] fuse/inode: activating
inode(46470877), lru=236831/1024
2007-07-29 21:45:18 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(46470877)
2007-07-29 21:45:18 E [fuse-bridge.c:2017:fuse_transport_notify]
glusterfs-fuse: got GF_EVENT_POLLERR
2007-07-29 21:45:18 D [fuse-bridge.c:1857:fuse_transport_disconnect]
glusterfs-fuse: cleaning up fuse transport in disconnect handler

...
2007-07-29 21:45:13 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(13058343)
2007-07-29 21:45:13 D [inode.c:327:__active_inode] fuse/inode: activating
inode(13058335), lru=330169/1024
2007-07-29 21:45:13 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(13058335)
2007-07-29 21:45:13 E [fuse-bridge.c:2017:fuse_transport_notify]
glusterfs-fuse: got GF_EVENT_POLLERR
2007-07-29 21:45:13 D [fuse-bridge.c:1857:fuse_transport_disconnect]
glusterfs-fuse: cleaning up fuse transport in disconnect handler

Both of them generate SIGCONT in gdb. But they have different behavior when
I continue:

The first one just loops with SIGCONT, the other a SIGPIPE:

client1:

(gdb) run
Starting program: /usr/sbin/glusterfs -N -LDEBUG
-l/var/log/glusterfs/glusterfs.log -f/etc/glusterfs/glusterfs-client.vol/backups
[Thread debugging using libthread_db enabled]
[New Thread -1209063744 (LWP 12708)]
Detaching after fork from child process 12710.
[New Thread 16427936 (LWP 12711)]

Program received signal SIGCONT, Continued.
[Switching to Thread 16427936 (LWP 12711)]
0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c
Continuing.

Program received signal SIGCONT, Continued.
0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005a3824 in raise () from /lib/tls/libpthread.so.0
#2  0x0013bb0c in tcp_bail (this=0x947b3d0) at
../../../../transport/tcp/tcp.c:146
#3  0x00122bbc in transport_bail (this=0x947b3d0) at transport.c:192
#4  0x0012ea16 in call_bail (trans=0x947b3d0) at client-protocol.c:220
#5  0x00123870 in gf_timer_proc (ctx=0xbfe97140) at timer.c:119
#6  0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
#7  0x00414c3e in clone () from /lib/tls/libc.so.6
(gdb)

client2:

Starting program: /usr/sbin/glusterfs -f/usr/etc/glusterfs/glusterfs-
client.vol -N -LDEBUG -l/var/log/glusterfs/glusterfs.log /backups
[Thread debugging using libthread_db enabled]
[New Thread -1208265024 (LWP 7022)]
Detaching after fork from child process 7023.
[New Thread 26753968 (LWP 7024)]

Program received signal SIGCONT, Continued.
[Switching to Thread 26753968 (LWP 7024)]
0x007167a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) c
Continuing.

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread -1208265024 (LWP 7022)]
0x007167a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x007167a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x007ef544 in writev () from /lib/tls/libc.so.6
#2  0x00f799a1 in full_rwv (fd=7, vector=0x20157, count=39, fn=0x7ef490
<writev>) at common-utils.c:100
#3  0x00962386 in tcp_client_writev (this=0x8ecf418, vector=0xbffe0fd0,
count=55) at tcp-client.c:251
#4  0x002a7b3b in client_protocol_xfer (frame=0xdd003b0, this=0x8e935c8,
type=GF_OP_TYPE_FOP_REQUEST, op=GF_FOP_WRITE,
    request=0xdea0028) at client-protocol.c:324
#5  0x002a8ec5 in client_writev (frame=0xdd003b0, this=0x8e935c8,
fd=0xc379bc8, vector=0xa8b05d0, count=32, offset=131072)
    at client-protocol.c:1126
#6  0x00efacf0 in wb_sync (frame=0xa8b057c, file=0xdb29698) at
write-behind.c:180
#7  0x00efb7eb in wb_writev (frame=0xdd003b0, this=0x8e97f80, fd=0xc379bc8,
vector=0xbffe13a0, count=1, offset=258048)
    at write-behind.c:603
#8  0x0804d4e8 in fuse_write (req=0xc04dcd8, ino=53232572,
    buf=0xb7b97048 "d6450 at mypacs.net>; Sun, 29 Jul 2007 08:34:58
-0700\nFrom: \"Ebay Leader\" <DemetriusVenable at nosopoetic.com>\nTo:
fubbed6450 at mypacs.net\nSubject: Ebay for dummies for Fubbed6450\nDate: Sun,
29 Jul 2007 07:28"..., size=4096,
    off=258048, fi=0xffffffe0) at fuse-bridge.c:1235
#9  0x00320c1a in do_write (req=0xc04dcd8, nodeid=53232572,
inarg=0xb7b97030) at fuse_lowlevel.c:635
#10 0x00321bd0 in fuse_ll_process (data=0x8e93718, buf=0xb7b97008 "@\020",
len=4160, ch=0x8e936a8) at fuse_lowlevel.c:1171
#11 0x00322ea3 in fuse_session_process (se=0xffffffe0, buf=0xb7b97008
"@\020", len=4160, ch=0x8e936a8) at fuse_session.c:89
#12 0x0804efb9 in fuse_transport_notify (xl=0x8e98020, event=2,
data=0x8e933d0) at fuse-bridge.c:2050
#13 0x00f7af6c in transport_notify (this=0x8e933d0, event=1) at transport.c
:154
#14 0x00f7b581 in sys_epoll_iteration (ctx=0xbffe1570) at epoll.c:54
#15 0x00f7b10b in poll_iteration (ctx=0xbffe1570) at transport.c:262
#16 0x0804a5e3 in main (argc=6, argv=0xbffe1644) at glusterfs.c:382
(gdb) c
Continuing.

All commands running on the mountpoint are frozen when this happens, and
when I run glusterfs without gdb, it just freezes up, and does not die.

If I suspend the server and do a backtrace on it, it shows:

(gdb) run
Starting program: /usr/sbin/glusterfsd -N -LDEBUG
-f/etc/glusterfs/glusterfs-server.vol -l/var/log/glusterfs/glusterfs.log
[Thread debugging using libthread_db enabled]
[New Thread 46912496205696 (LWP 22070)]
glusterfsd: WARNING: ignoring stale pidfile for PID 22044
[New Thread 1084229952 (LWP 22071)]
[New Thread 1094719808 (LWP 22072)]
[New Thread 1105209664 (LWP 22073)]

Program received signal SIGTSTP, Stopped (user).
[Switching to Thread 46912496205696 (LWP 22070)]
0x000000358400cb5b in __read_nocancel () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000358400cb5b in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00000036a800ed08 in full_rw (fd=8, buf=0x2aaaac155c90
"00000007\n0000000b:00000006\nCALLER_PID", size=131287,
    op=0x358400cb10 <read>) at common-utils.c:51
#2  0x00000036a800ed7a in gf_full_read (fd=8, buf=0x2aaaac155c90
"00000007\n0000000b:00000006\nCALLER_PID", size=131287)
    at common-utils.c:71
#3  0x00002aaaaaedce8f in tcp_recieve (this=0xd9f13c0, buf=0x2aaaac155c90
"00000007\n0000000b:00000006\nCALLER_PID",
    len=131287) at ../../../../transport/tcp/tcp.c:49
#4  0x00000036a80104a8 in gf_block_unserialize_transport (trans=0xd9f13c0)
at protocol.c:340
#5  0x00002aaaaacd842c in notify (this=0xd9f7e10, event=2, data=0xd9f13c0)
at server-protocol.c:5819
#6  0x00000036a8010df2 in transport_notify (this=0xd9f13c0, event=1) at
transport.c:154
#7  0x00000036a80114e9 in epoll_notify (eevent=1, data=0xd9f13c0) at epoll.c
:54
#8  0x00000036a80117a7 in sys_epoll_iteration (ctx=0x7fff2458de70) at
epoll.c:146
#9  0x00000036a8010fc6 in poll_iteration (ctx=0x7fff2458de70) at transport.c
:262
#10 0x0000000000401a58 in main (argc=5, argv=0x7fff2458dfc8) at glusterfsd.c
:309
(gdb)

When I kill the server and restart it, the client will show Transport not
connected the first time accessing the mount (ie, with "ls"), but the 2nd
time it works and shows the directories therein.

This happens with great regularity. Sometimes the message in the client logs
is "connection on x is still in progress - try again later" but sometimes
it's this disconnected message. Every time, the cure for me is the same--
restart the server and the clients become happy and continue on their merry
way again.

I don't have any problem when I remove writebehind from the clients so that
it's just plain vanilla client -> server, but my throughput drops from
40MB/s to 150Kbps which is not useable for me and so maybe I just haven't
let it go long enough. I have to use pre6 because I'm doing hardlinks on the
mount, so I can't drop to a lower version for stability.

Because this happens with great regularity (~ every 6hrs), I'm more than
happy to do any tracing/debugging that anyone can suggest. Otherwise, I'll
try to start some debugging in the code when I have a few spare cycles (few
and far between!)

Again, this is on a stock kernel on CentOS 5 installation, pre6 glusterfs
client and server, and fuse 2.7.0-glfs1.

client configuration:

 ### Add client feature and attach to remote subvolume
 volume brick
   type protocol/client
   option transport-type tcp/client     # for TCP/IP transport
   option remote-host 192.168.2.5       # IP address of the remote brick
   option remote-subvolume brick_1  # name of the remote volume
 end-volume

# #### Add writeback feature
 volume brick-wb
   type performance/write-behind
   option aggregate-size 131072 # unit in bytes
   subvolumes brick
 end-volume

server configuration:

 ### Export volume "brick" with the contents of "/home/export" directory.
 volume brick_1
   type storage/posix
   option directory /home/vg_3ware1/vivalog/brick_1
 end-volume
 volume brick_2
   type storage/posix
   option directory /home/vg_3ware1/vivalog/brick_2
 end-volume

 ### Add network serving capability to above brick.
 volume server
   type protocol/server
   option transport-type tcp/server     # For TCP/IP transport
   option bind-address 192.168.2.5     # Default is to listen on all
interfaces
   subvolumes brick_1
   option auth.ip.brick_1.allow * # Allow access to "brick" volume
 end-volume

 ### Add network serving capability to above brick.
 volume server
   type protocol/server
   option transport-type tcp/server     # For TCP/IP transport
   option bind-address 192.168.2.5     # Default is to listen on all
interfaces
   subvolumes brick_2
   option listen-port 6997
   option auth.ip.brick_2.allow * # Allow access to "brick" volume
 end-volume

I tried a single "volume server" entry for both bricks, and I get the same
thing.  Also, I have an ssh connection between the two servers open all the
time which never drops, so I'm ruling out network problems across the
switch.

Help!
:august

On 7/26/07, August R. Wohlt <glusterfs at isidore.net> wrote:
>
> Hi avati,
>
> When I run it without gdb, it still has the same behavior. It'll run fine
> for a few hours under load and then freeze. When it does, the client spews
> these to the logs forever. When I kill glusterfs and remount the directory,
> everything's fine again:
>
> 2007-07-26 12:21:31 D [fuse-bridge.c:344:fuse_entry_cbk] glusterfs-fuse:
> ERR => -1 (107)
> 2007-07-26 12:21:31 D [inode.c:285:__destroy_inode] fuse/inode: destroy
> inode(0)
> 2007-07-26 12:23:34 W [client-protocol.c :4158:client_protocol_reconnect]
> brick: attempting reconnect
> 2007-07-26 12:23:34 D [tcp-client.c:178:tcp_connect] brick: connection on
> 4 still in progress - try later
> 2007-07-26 12:29:51 W [client-protocol.c:4158:client_protocol_reconnect]
> brick: attempting reconnect
> 2007-07-26 12:29:51 E [tcp-client.c:170:tcp_connect] brick: non-blocking
> connect() returned: 110 (Connection timed out)
>
> :g
>
> On 7/26/07, Anand Avati <avati at zresearch.com> wrote:
> >
> > August,
> >  It seems to me that you were running the client in GDB, and for some
> > reason that particular client bailed out. While bailing out the client
> > raises SIGCONT which has been caught by gdb (gdb catches all signals before
> > letting the signal handlers take over). the backtrace you have attached is
> > NOT a crash, you had to just 'c' (continue) at the gdb. And most likely,
> > this is what has given the 'hung' effect as well.
> > Is this reproducible for you?
> >
> > thanks,
> > avati
> >
> > 2007/7/26, August R. Wohlt < glusterfs at isidore.net>:
> > >
> > > Hi all -
> > >
> > > I have client and server set up with the pre6 version of gluserfs.
> > > Several
> > > times a day the client mount will freeze up as does any command that
> > > tries
> > > to read from the mountpoint. I have to kill the glusterfs process,
> > > unmount
> > > the directory and remount it to get it to work again.
> > >
> > > When this happens, there is another glusterfs client on other machines
> > > connected to the same server that does not get disconnected. So the
> > > timeout
> > > message in the logs is confusing to me. If it's really timing out
> > > wouldn't
> > > the other server be disconnected, too?
> > >
> > > This is on CentOS 5 with fuse 2.7.0-glfs.
> > >
> > > When it happens, here's what shows up in the client:
> > >
> > > ...
> > > 2007-07-25 09:45:59 D [inode.c:327:__active_inode] fuse/inode:
> > > activating
> > > inode(4210807), lru=0/1024
> > > 2007-07-25 09:45:59 D [inode.c:285:__destroy_inode] fuse/inode:
> > > destroy
> > > inode(4210807)
> > > 2007-07-25 12:37:26 W [client-protocol.c:211:call_bail] brick:
> > > activating
> > > bail-out. pending frames = 1. last sent =
> > > 2007-07-25 12:33:42. last received = 2007-07-25 11:42:59
> > > transport-timeout =
> > > 120
> > > 2007-07-25 12:37:26 C [client-protocol.c:219:call_bail] brick: bailing
> > > transport
> > > 2007-07-25 12:37:26 W [client-protocol.c:4189:client_protocol_cleanup]
> > >
> > > brick: cleaning up state in transport object
> > > 0x80a03d0
> > > 2007-07-25 12:37:26 W [client-protocol.c:4238:client_protocol_cleanup]
> > > brick: forced unwinding frame type(0) op(15)
> > > 2007-07-25 12:37:26 C [tcp.c:81:tcp_disconnect] brick: connection
> > > disconnected
> > >
> > > When it happens, here's what shows up in the server:
> > >
> > > 2007-07-25 15:37:40 E [protocol.c:346:gf_block_unserialize_transport]
> > > libglusterfs/protocol: full_read of block failed: peer (
> > > 192.168.2.3:1023)
> > > 2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
> > > disconnected
> > > 2007-07-25 15:37:40 E [ protocol.c:251:gf_block_unserialize_transport]
> > > libglusterfs/protocol: EOF from peer ( 192.168.2.4:1023)
> > > 2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
> > > disconnected
> > >
> > > And here's the client backtrace:
> > >
> > > (gdb) bt
> > > #0  0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld- linux.so.2
> > > #1  0x005a3824 in raise () from /lib/tls/libpthread.so.0
> > > #2  0x00655b0c in tcp_bail (this=0x80a03d0) at
> > > ../../../../transport/tcp/tcp.c:146
> > > #3  0x00695bbc in transport_bail (this=0x80a03d0) at transport.c :192
> > > #4  0x00603a16 in call_bail (trans=0x80a03d0) at client-protocol.c:220
> > > #5  0x00696870 in gf_timer_proc (ctx=0xbffeec30) at timer.c:119
> > > #6  0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
> > > #7  0x00414c3e in clone () from /lib/tls/libc.so.6
> > >
> > >
> > > client config:
> > >
> > > ### Add client feature and attach to remote subvolume
> > > volume brick
> > >    type protocol/client
> > >    option transport-type tcp/client     # for TCP/IP transport
> > >    option remote-host 192.168.2.5       # IP address of the remote
> > > brick
> > >    option remote-subvolume brick_1  # name of the remote volume
> > > end-volume
> > >
> > > # #### Add writeback feature
> > >   volume brick-wb
> > >     type performance/write-behind
> > >     option aggregate-size 131072 # unit in bytes
> > >     subvolumes brick
> > >   end-volume
> > >
> > > server config:
> > >
> > > ### Export volume "brick" with the contents of "/home/export"
> > > directory.
> > > volume brick_1
> > >    type storage/posix
> > >    option directory /home/vg_3ware1/vivalog/brick_1
> > > end-volume
> > >
> > > volume brick_2
> > >    type storage/posix
> > >    option directory /home/vg_3ware1/vivalog/brick_2
> > > end-volume
> > >
> > > ### Add network serving capability to above brick.
> > > volume server
> > >    type protocol/server
> > >    option transport-type tcp/server     # For TCP/IP transport
> > >    option bind-address 192.168.2.5     # Default is to listen on all
> > > interfaces
> > >    subvolumes brick_1
> > >    option auth.ip.brick_2.allow * # Allow access to "brick" volume
> > >    option auth.ip.brick_1.allow * # Allow access to "brick" volume
> > > end-volume
> > >
> > > ps I have one server serving two volume bricks to two physically
> > > distinct
> > > clients.  I assume this is okay--that I don't need to have two
> > > separate
> > > server declarations.
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> >
> > --
> > Anand V. Avati
>
>
>