[Gluster-devel] no such file of directory errors running find -exec on large trees

Anand Avati avati at zresearch.com
Mon Jun 4 12:44:54 UTC 2007


Harris,
  a bug fix in the timer code has been committed which used to cause
spurious transport timeout events causing false bailing. please try
with the latest tla checkout from glusterfs--mainline--2.4.

thanks,
avati

2007/5/26, Harris Landgarten <harrisl at lhjonline.com>:
> I have been running some tests on a gluster tree with over 100,000 files using find. When I run find /mnt/gluster -type f it completes without error. When I combine the find with -exec that causes the files to be read (ie find /mnt/gluster -type f -exec head -n1 {} \; ) The process fails to complete 100% of the time with "No such file or directory" errors. The errors do not occur until after at least 30,000 files have been processed. After the process fails the clients and servers still appear to be running properly. The errors seem to coincide with the following in the logs:
>
> Client running process:
>
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> ...
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: connection to server disconnected
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:34] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - error: Transport endpoint is not connected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=104)
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: connection to server disconnected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> ...
>
> brick1:
>
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:110/full_rwv()] libglusterfs:full_rwv: 50680 bytes r/w instead of 124559 (Broken pipe)
> [May 25 10:11:26] [ERROR/proto-srv.c:117/generic_reply()] protocol/server:transport_writev failed
> [May 25 10:11:26] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - error: Transport endpoint is not connected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=107)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> ...
>
> brick2:
>
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:11:43] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: connection to server disconnected
> ...
>
> Client2 running (logging = debug)
>
> [May 25 10:12:06] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:06] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any)
> [May 25 10:12:06] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=115)
> [May 25 10:12:06] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:06] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: connection to server disconnected
> [May 25 10:12:06] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: :try_connect: socket fd = 4
> [May 25 10:12:06] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: :try_connect: finalized on port `1023'
> [May 25 10:12:06] [DEBUG/tcp-client.c:226/tcp_connect()] tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:12:06] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect on 4 in progress (non-blocking)
> [May 25 10:12:06] [DEBUG/tcp-client.c:301/tcp_connect()] tcp/client:connection on 4 still in progress - try later
> [May 25 10:12:06] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:26] [DEBUG/tcp-client.c:310/tcp_connect()] tcp/client:connection on 4 success, attempting to handshake
> [May 25 10:12:26] [DEBUG/tcp-client.c:54/do_handshake()] transport/tcp-client:dictionary length = 50
> [May 25 10:12:26] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:12:26] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any)
> [May 25 10:13:11] [CRITICAL/client-protocol.c:218/call_bail()] client/protocol:bailing transport
> [May 25 10:13:11] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to break on blocked socket (if any)
> [May 25 10:13:11] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 (errno=115)
> [May 25 10:13:11] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed
> [May 25 10:13:11] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:13:11] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: connection to server disconnected
> [May 25 10:13:11] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: :try_connect: socket fd = 4
> [May 25 10:13:11] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: :try_connect: finalized on port `1022'
> [May 25 10:13:11] [DEBUG/tcp-client.c:226/tcp_connect()] tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:13:11] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect on 4 in progress (non-blocking)
> [May 25 10:13:11] [DEBUG/tcp-client.c:301/tcp_connect()] tcp/client:connection on 4 still in progress - try later
> ...
>
> Both clients are running with read-ahead and write_behind on and stat_prefetch off. All machines are on current mainline-2.4 as of patch 174. Since both bricks are seeing the same error does that mean the client is causing the error? The errors do not alway start at the same file but always happen after a lot of files are processed.
>
> Are there any specific tests you would like run?
>
> Best,
>
> Harris
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>


-- 
Anand V. Avati





More information about the Gluster-devel mailing list