[Gluster-devel] Could be the bug of Glusterfs? The file system is unstable and hang

Shehjar Tikoo shehjart at gluster.com
Mon Jun 1 04:44:26 UTC 2009


Alpha Electronics wrote:
> We are testing the glusterfs before recommending them to enterprise 
> clients. We found that the file system always hang after running for 
> about 2 days. after killing the server side process and then restart, 
> everything goes back to normal.
> 

What is the server config?
If you're not using io-threads on the server, I suggest you do,
because it does basic load-balancing to avoid timeouts.

Also, avoid using autoscaling in io-threads for now.

-Shehjar


>  Here is the spec and error logged:
> GlusterFS version:  v2.0.1
> 
> Client volume:
> volume brick_1
>   type protocol/client
>   option transport-type tcp/client
>   option remote-port 7777 # Non-default port
>   option remote-host server1
>   option remote-subvolume brick
> end-volume
> 
> volume brick_2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-port 7777 # Non-default port
>   option remote-host server2
>   option remote-subvolume brick
> end-volume
> 
> volume bricks
>   type cluster/distribute
>   subvolumes brick_1 brick_2
> end-volume
> 
> Error logged on client side through /var/log/glusterfs.log
> [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail] brick_1: 
> bailing out frame LK(28) frame sent = 2009-05-29 14:28:54. frame-timeout 
> = 1800
> [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk] 
> glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not connected)
> error logged on server
> [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail] brick_2: 
> bailing out frame LK(28) frame sent = 2009-05-29 14:29:05. frame-timeout 
> = 1800
> [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk] 
> glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not connected)
> 
> There is error message logged on server side after 1 hour in 
> /var/log/messages:
> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] 
> lib/util_sock.c:write_data(564)
> May 29 16:04:16 server2 winbindd[3649]:   write_data: write failure. 
> Error = Connection reset by peer
> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] 
> libsmb/clientgen.c:write_socket(158)
> May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error writing 
> 104 bytes to socket 18: ERRNO = Connection reset by peer
> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] 
> libsmb/clientgen.c:cli_send_smb(188)
> May 29 16:04:16 server2 winbindd[3649]:   Error writing 104 bytes to 
> client. -1 (Connection reset by peer)
> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] 
> libsmb/cliconnect.c:cli_session_setup_spnego(859)
> May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot contact 
> any KDC for requested realm
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel






More information about the Gluster-devel mailing list