[Gluster-devel] Could be the bug of Glusterfs? The file system is unstable and hang

Tue Jun 2 05:25:44 UTC 2009

Hi

 >
 >     Also, avoid using autoscaling in io-threads for now.
 >
 >     -Shehjar
 >
 >

-Shehjar

Alpha Electronics wrote:
> Thanks for looking into this. We do use io-threads. Here is the server 
> config:
> : volume brick1-posix
>   2:  type storage/posix
>   3:  option directory /mnt/brick1
>   4: end-volume
>   5:
>   6: volume brick2-posix
>   7:  type storage/posix
>   8:  option directory /mnt/brick2
>   9: end-volume
>  10:
>  11:
>  12: volume brick1-locks
>  13:   type features/locks
>  14:   subvolumes brick1-posix
>  15: end-volume
>  16:
>  17: volume brick2-locks
>  18:   type features/locks
>  19:   subvolumes brick2-posix
>  20: end-volume
>  21:
>  22: volume brick1
>  23:  type performance/io-threads
>  24:  option min-threads 16
>  25:  option autoscaling on
>  26:  subvolumes brick1-locks
>  27: end-volume
>  28:
>  29: volume brick2
>  30:  type performance/io-threads
>  31:  option min-threads 16
>  32:  option autoscaling on
>  33:  subvolumes brick2-locks
>  34: end-volume
>  35:
>  36: volume server
>  37:  type protocol/server
>  38:  option transport-type tcp
>  40:  option auth.addr.brick1.allow *
>  41:  option auth.addr.brick2.allow *
>  42:  subvolumes brick1 brick2
>  43: end-volume
>  44:
> 
> 
> 
> On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart at gluster.com 
> <mailto:shehjart at gluster.com>> wrote:
> 
>     Alpha Electronics wrote:
> 
>         We are testing the glusterfs before recommending them to
>         enterprise clients. We found that the file system always hang
>         after running for about 2 days. after killing the server side
>         process and then restart, everything goes back to normal.
> 
> 
>     What is the server config?
>     If you're not using io-threads on the server, I suggest you do,
>     because it does basic load-balancing to avoid timeouts.
> 
>     Also, avoid using autoscaling in io-threads for now.
> 
>     -Shehjar
> 
> 
>          Here is the spec and error logged:
>         GlusterFS version:  v2.0.1
> 
>         Client volume:
>         volume brick_1
>          type protocol/client
>          option transport-type tcp/client
>          option remote-port 7777 # Non-default port
>          option remote-host server1
>          option remote-subvolume brick
>         end-volume
> 
>         volume brick_2
>          type protocol/client
>          option transport-type tcp/client
>          option remote-port 7777 # Non-default port
>          option remote-host server2
>          option remote-subvolume brick
>         end-volume
> 
>         volume bricks
>          type cluster/distribute
>          subvolumes brick_1 brick_2
>         end-volume
> 
>         Error logged on client side through /var/log/glusterfs.log
>         [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]
>         brick_1: bailing out frame LK(28) frame sent = 2009-05-29
>         14:28:54. frame-timeout = 1800
>         [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>         glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not
>         connected)
>         error logged on server
>         [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]
>         brick_2: bailing out frame LK(28) frame sent = 2009-05-29
>         14:29:05. frame-timeout = 1800
>         [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>         glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not
>         connected)
> 
>         There is error message logged on server side after 1 hour in
>         /var/log/messages:
>         May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>         lib/util_sock.c:write_data(564)
>         May 29 16:04:16 server2 winbindd[3649]:   write_data: write
>         failure. Error = Connection reset by peer
>         May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>         libsmb/clientgen.c:write_socket(158)
>         May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error
>         writing 104 bytes to socket 18: ERRNO = Connection reset by peer
>         May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>         libsmb/clientgen.c:cli_send_smb(188)
>         May 29 16:04:16 server2 winbindd[3649]:   Error writing 104
>         bytes to client. -1 (Connection reset by peer)
>         May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>         libsmb/cliconnect.c:cli_session_setup_spnego(859)
>         May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot
>         contact any KDC for requested realm
> 
> 
>         ------------------------------------------------------------------------
> 
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>         http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> 
>