[Gluster-devel] Random errors with "Transport endpoint is not connected"

Constantin Teodorescu teo at flex.ro
Sat Jun 23 19:33:20 UTC 2007


Hi all,
first of all, glusterfs is a nice idea, liked it a lot, but it need to 
be a rock-solid product so I made some tests , hope that this bug-report 
will help you.

Configuration:
3 identical servers and a client, connected with TCP/IP
Servers: Mandrake 9.x (compiled, installed without problem) , client 
CentOS-5 x86_64, fuse 2.6.5 compiled, installed from source, everything 
went OK.

Those 3 servers provide 3 simple bricks joined at the client in full 
mirror ( afr x 3) configuration + read-ahead & writebehind translators.
I made a PostgreSQL tablespace (zone) on the mounted /mnt/gfs and copied 
there a 50 Mb table then stress-it with various operations.
10 read and full updates on every row in the table succeeded.

After a while, a simple "vacuum full analyze" gave the error:
glu=# vacuum full analyze;
ERROR:  could not read block 43155 of relation 
527933664/527933665/527933666: Transport endpoint is not connected

I repeated the tests many times, after 2,3  minutes of operation, I got 
the same error, in another place but mostly in WRITE operations.
I deleted the whole mounted client disk and rebuild it with STRIPE 
option instead of AFR translator.
The behaviour is the same ... after a couple of succeded operations, I 
got a failure.

That were the facts, now ... the logs and configuration files.

The client debug log shows this errors:
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[1:48:22] [ERROR/common-utils.c:110/full_rwv()] libglusterfs:full_rwv: 
6689 bytes r/w instead of 8539 (Broken pipe)
[1:48:22] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
...
...
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[1:48:22] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[1:48:22] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[1:48:22] [DEBUG/client-protocol.c:2708/client_protocol_interpret()] 
protocol/client:frame not found for blk with callid: 139893
[1:48:22] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
protocol/client:cleaning up state in transport object 0x16595730
[1:48:22] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: 
connection to server disconnected
[1:48:22] [CRITICAL/common-utils.c:215/gf_print_trace()] 
debug-backtrace:Got signal (11), printing backtrace
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x21) 
[0x2aaaaaccf4a1]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/lib64/libc.so.6 [0x2aaaab53b070]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so(ra_frame_return+0x142) 
[0x2aaaac4da2a2]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/usr/lib/glusterfs/1.3.0-pre4/xlator/performance/read-ahead.so 
[0x2aaaac4d9daa]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:[glusterfs] [0x40910b]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/usr/lib64/libfuse.so.2 [0x2aaaaaee3059]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:[glusterfs] [0x402f29]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd4) 
[0x2aaaaacd0ef4]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:[glusterfs] [0x402898]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:/lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaab5288a4]
[1:48:22] [CRITICAL/common-utils.c:217/gf_print_trace()] 
debug-backtrace:[glusterfs] [0x4025f9]

The server configuration files are like that :
-------------------
volume brick
        type storage/posix
        option directory /var/gldata
end-volume

volume server
        type protocol/server
        option transport-type tcp/server
        option listen-port 6996
        option bind-address 29.11.276.x
        subvolumes brick
        option auth.ip.brick.allow *
end-volume
-------------------

The client configuration file is :
-------------------
volume clientX            #{1,2,3}
 type protocol/client
 option transport-type tcp/client
 option remote-host A.B.C.X
 option remote-port 6996
 option remote-subvolume brick
end-volume


### Add AFR feature to brick
volume afr
  type cluster/afr
  subvolumes client1 client2 client3
  option replicate *:3                 # All files 3 copies
end-volume

#volume stripe
#   type cluster/stripe
#   subvolumes client1 client2 client3
#   option block-size *:256kB
#end-volume

#volume trace
#  type debug/trace
#  subvolumes afr
#  option debug on
#end-volume

volume writebehind
   type performance/write-behind
   option aggregate-size 131072 # aggregate block size in bytes
   subvolumes afr
end-volume

volume readahead
   type performance/read-ahead
   option page-size 131072 ### size in bytes
   option page-count 16 ### page-size x page-count is the amount of 
read-ahead data per file
   subvolumes writebehind
end-volume
-------------------







More information about the Gluster-devel mailing list