[Gluster-devel] 2 out of 4 bonnies failed :-((

Mon Jan 7 16:33:16 UTC 2008

Sascha,
 the logs say op_errno=28, which is ENOSPC (no space left on device). were
you aware of that already?

avati

2008/1/7, Sascha Ottolski <ottolski at web.de>:
>
> Hi,
>
> I found a somewhat frustrating test result after the weekend. I startet a
> bonnie on four different clients (so a total of four bonnies in parallel).
> I
> have two servers, each two partitions, wich are unifed and afred "over
> cross", so each server has a brick and a mirrored brick of the other,
> using
> tla patch-628.
>
> For one, the results seem to be not too promising, as it more than 48
> hours
> hours to complete. Doing a bonnie on only one client took "only" about 12
> hours (unfortunately, I don't have exact numbers about the runtime).
>
> But even worse, two of the bonnies didn't finish at all. The first client
> dropped out after approx. 8 hours, claiming "Can't open
> file ./Bonnie.17791.001". However, the file is (partly) there, also on the
> afr-mirror, but with different sizes. The log suggests that it was a
> timeout
> problem (if I interpret it correctly):
>
> 2008-01-06 03:48:10 E [afr.c:3364:afr_close_setxattr_cbk] afr1:
> (path=/Bonnie.17791.027 child=fsc1) op_ret=-1 op_errno=28
> 2008-01-06 03:50:34 W [client-protocol.c:209:call_bail] ns1: activating
> bail-out. pending frames = 1. last sent = 2008-01-06 03:48:17
> . last received = 2008-01-06 03:48:17 transport-timeout = 108
> 2008-01-06 03:50:34 C [client-protocol.c:217:call_bail] ns1: bailing
> transport
> 2008-01-06 03:50:34 W [client-protocol.c:4490:client_protocol_cleanup]
> ns1:
> cleaning up state in transport object 0x522e40
> 2008-01-06 03:50:34 E [client-protocol.c:4542:client_protocol_cleanup]
> ns1:
> forced unwinding frame type(1) op(5) reply=@0x2aaaab407a0
> 0
> 2008-01-06 03:50:34 E [afr.c:2573:afr_selfheal_lock_cbk] afrns:
> (path=/Bonnie.17791.001 child=ns1) op_ret=-1 op_errno=107
> 2008-01-06 03:50:34 E [afr.c:2744:afr_open] afrns: self heal failed,
> returning
> EIO
> 2008-01-06 03:50:34 C [tcp.c:81:tcp_disconnect] ns1: connection
> disconnected
> 2008-01-06 03:51:00 E [afr.c:1907:afr_selfheal_sync_file_writev_cbk] afr1:
> (path=/Bonnie.17791.001 child=fsc1) op_ret=-1 op_errno=28
> 2008-01-06 03:51:00 E [afr.c:1693:afr_error_during_sync] afr1: error
> during
> self-heal
> 2008-01-06 03:51:03 E [afr.c:2744:afr_open] afr1: self heal failed,
> returning
> EIO
> 2008-01-06 03:51:03 E [fuse-bridge.c:670:fuse_fd_cbk] glusterfs-fuse:
> 12276158: /Bonnie.17791.001 => -1 (5)
> 2008-01-07 04:40:17 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse:
> 15841600: /Bonnie.26672.026 => -1 (2)
>
>
> The second had a problem in creating / removing a dir:
>
> Create files in sequential order...Can't make directory ./Bonnie.26672
> Cleaning up test directory after error.
> Bonnie: drastic I/O error (rmdir): No such file or directory
>
> On this client, there is nothing found in the logs. For both cases,
> nothing is
> in the server logs either (both server and clients had no special debug
> level
> enabled).
>
> No, the million dollar question is, how would I debug this situation,
> preferably a bit quicker than 48 hours...
>
>
> Thanks,
>
> Sascha
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.