[Gluster-devel] Buggy writebehind translators !!!

Sun Jun 24 20:40:20 UTC 2007

Anand Avati wrote:
> Teo,
> If  you are using glusterfs--mainline--2.4, please add to the 
> write-behind
> 'option flush-behind off' and if needed 'option transport-timeout 
> <secs>' to
> protocol/client volumes (where <secs> is sufficiently large enough). the
> optoin flush-behind should most likely fix your error.
>
> Instead you could just tla update to the latest patch (patch-184) and the
> errors should disappear.

Added 'option flush-behind off' and 'option transport-timeout 20'
This time it didn't crash right from the begining ... 2 updates and 1 
vacuum were OK.

glu=# update animal set observatii='ok1';
UPDATE 713268

glu=# update animal set observatii='ok2';
UPDATE 713268

glu=# vacuum;
VACUUM

glu=# update animal set observatii='ok3';
ERROR:  could not read block 206 of relation 
534643271/534643272/534643273: File descriptor in bad state

Thank you for your patience !
Teo

LOGS
Client
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643273,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643276,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643278,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643281,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643285,
[Jun 24 23:19:42] [DEBUG/afr.c:65/afr_get_num_copies()] afr:matched! 
pattern = *, filename = 534643287,
[Jun 24 23:20:30] [ERROR/common-utils.c:55/full_rw()] 
libglusterfs:full_rw: 56359 bytes r/w instead of 131158 (errno=104)
[Jun 24 23:20:30] 
[DEBUG/protocol.c:331/gf_block_unserialize_transport()] 
libglusterfs/protocol:gf_block_unserialize_transport: full_read of block 
failed
[Jun 24 23:20:30] 
[DEBUG/client-protocol.c:2609/client_protocol_cleanup()] 
protocol/client:cleaning up state in transport object 0x86ca020
[Jun 24 23:20:30] [CRITICAL/tcp.c:81/tcp_disconnect()] 
transport/tcp:client1: connection to server disconnected
--------------------------------------------------
Server ( I noticed that server2 and server3 didn't show any CRITICAL 
error in their logs, just server1 had problems) 

[Jun 24 23:23:28] [ERROR/common-utils.c:110/full_rwv()] 
libglusterfs:full_rwv: 98464 bytes r/w instead of 131281 (Connection 
reset by peer)
[Jun 24 23:23:28] [ERROR/proto-srv.c:117/generic_reply()] 
protocol/server:transport_writev failed
[Jun 24 23:23:28] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown 
() - error: Transport endpoint is not connected
[Jun 24 23:23:28] [ERROR/common-utils.c:55/full_rw()] 
libglusterfs:full_rw: 129983 bytes r/w instead of 131299 (errno=107)
[Jun 24 23:23:28] 
[DEBUG/protocol.c:331/gf_block_unserialize_transport()] 
libglusterfs/protocol:gf_block_unserialize_transport: full_read of block 
failed
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2826/open_file_cleanup_fn()] 
protocol/server:force releaseing file 0x8051d30
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2826/open_file_cleanup_fn()] 
protocol/server:force releaseing file 0x8051c90
[Jun 24 23:23:28] [DEBUG/proto-srv.c:2867/proto_srv_cleanup()] 
protocol/server:cleaned up xl_private of 0x804b1d0
[Jun 24 23:23:28] [CRITICAL/tcp.c:81/tcp_disconnect()] 
transport/tcp:server: connection to server disconnected
[Jun 24 23:23:28] [DEBUG/tcp-server.c:229/gf_transport_fini()] 
tcp/server:destroying transport object for 29.72.76.22:1021 (fd=5)