[Gluster-devel] Could be the bug of Glusterfs? The file system is unstable and hang
Rodrigo Azevedo
rodrigoams at gmail.com
Tue Jun 2 12:38:08 UTC 2009
I have the same problem with replicate + autoscaling! disabling
autoscaling do things better. The main reason, I think, are files
opened for a long time without updates... the servers simply lost
connection to *every* clients.
2009/6/2 Shehjar Tikoo <shehjart at gluster.com>:
>
> Hi
>
>>
>> Also, avoid using autoscaling in io-threads for now.
>>
>> -Shehjar
>>
>>
>
> -Shehjar
>
> Alpha Electronics wrote:
>>
>> Thanks for looking into this. We do use io-threads. Here is the server
>> config:
>> : volume brick1-posix
>> 2: type storage/posix
>> 3: option directory /mnt/brick1
>> 4: end-volume
>> 5:
>> 6: volume brick2-posix
>> 7: type storage/posix
>> 8: option directory /mnt/brick2
>> 9: end-volume
>> 10:
>> 11:
>> 12: volume brick1-locks
>> 13: type features/locks
>> 14: subvolumes brick1-posix
>> 15: end-volume
>> 16:
>> 17: volume brick2-locks
>> 18: type features/locks
>> 19: subvolumes brick2-posix
>> 20: end-volume
>> 21:
>> 22: volume brick1
>> 23: type performance/io-threads
>> 24: option min-threads 16
>> 25: option autoscaling on
>> 26: subvolumes brick1-locks
>> 27: end-volume
>> 28:
>> 29: volume brick2
>> 30: type performance/io-threads
>> 31: option min-threads 16
>> 32: option autoscaling on
>> 33: subvolumes brick2-locks
>> 34: end-volume
>> 35:
>> 36: volume server
>> 37: type protocol/server
>> 38: option transport-type tcp
>> 40: option auth.addr.brick1.allow *
>> 41: option auth.addr.brick2.allow *
>> 42: subvolumes brick1 brick2
>> 43: end-volume
>> 44:
>>
>>
>>
>> On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart at gluster.com
>> <mailto:shehjart at gluster.com>> wrote:
>>
>> Alpha Electronics wrote:
>>
>> We are testing the glusterfs before recommending them to
>> enterprise clients. We found that the file system always hang
>> after running for about 2 days. after killing the server side
>> process and then restart, everything goes back to normal.
>>
>>
>> What is the server config?
>> If you're not using io-threads on the server, I suggest you do,
>> because it does basic load-balancing to avoid timeouts.
>>
>> Also, avoid using autoscaling in io-threads for now.
>>
>> -Shehjar
>>
>>
>> Here is the spec and error logged:
>> GlusterFS version: v2.0.1
>>
>> Client volume:
>> volume brick_1
>> type protocol/client
>> option transport-type tcp/client
>> option remote-port 7777 # Non-default port
>> option remote-host server1
>> option remote-subvolume brick
>> end-volume
>>
>> volume brick_2
>> type protocol/client
>> option transport-type tcp/client
>> option remote-port 7777 # Non-default port
>> option remote-host server2
>> option remote-subvolume brick
>> end-volume
>>
>> volume bricks
>> type cluster/distribute
>> subvolumes brick_1 brick_2
>> end-volume
>>
>> Error logged on client side through /var/log/glusterfs.log
>> [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]
>> brick_1: bailing out frame LK(28) frame sent = 2009-05-29
>> 14:28:54. frame-timeout = 1800
>> [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>> glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not
>> connected)
>> error logged on server
>> [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]
>> brick_2: bailing out frame LK(28) frame sent = 2009-05-29
>> 14:29:05. frame-timeout = 1800
>> [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>> glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not
>> connected)
>>
>> There is error message logged on server side after 1 hour in
>> /var/log/messages:
>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>> lib/util_sock.c:write_data(564)
>> May 29 16:04:16 server2 winbindd[3649]: write_data: write
>> failure. Error = Connection reset by peer
>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>> libsmb/clientgen.c:write_socket(158)
>> May 29 16:04:16 server2 winbindd[3649]: write_socket: Error
>> writing 104 bytes to socket 18: ERRNO = Connection reset by peer
>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>> libsmb/clientgen.c:cli_send_smb(188)
>> May 29 16:04:16 server2 winbindd[3649]: Error writing 104
>> bytes to client. -1 (Connection reset by peer)
>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>> libsmb/cliconnect.c:cli_session_setup_spnego(859)
>> May 29 16:04:16 server2 winbindd[3649]: Kinit failed: Cannot
>> contact any KDC for requested realm
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
Rodrigo Azevedo Moreira da Silva
Departamento de Física
Universidade Federal de Pernambuco
http://www.df.ufpe.br
More information about the Gluster-devel
mailing list