[Gluster-users] two stability glitches after continuous file operations for a month

Raghavendra G raghavendra.hg at gmail.com
Tue Dec 2 09:19:56 UTC 2008


Hi,

Please find the comments inlined.

On Mon, Dec 1, 2008 at 8:54 PM, Manhong Dai <daimh at umich.edu> wrote:

> Hi,
>
>
>        After a month's file operations, which included coping 20 million of
> small files and about 20 thousand of cluster jobs, I am  overall
> satisfied except two stability glitches.
>
>
> 1. A small portion (about 1%?) of jobs got an error of "transport
> endpoint not connected", and output file is incomplete. This error
> happened on random computing nodes, and it doesn't affect subsequent
> jobs on the same node. An example of error message of glusterfsd is
> 2008-11-19 23:09:51 E [protocol.c:271:gf_block_unserialize_transport]
> server: EOF from peer (172.20.102.2:1022)
>
> Error of glusterfs is either (looks to be caused by brick)
> 2008-11-19 23:09:52 C [client-protocol.c:212:call_bail] muskie-brick:
> bailing transport
> 2008-11-19 23:09:52 E [client-protocol.c:4834:client_protocol_cleanup]
> muskie-brick: forced unwinding frame type(1) op(14) reply=@0x67e2150
> 2008-11-19 23:09:52 E [client-protocol.c:3254:client_write_cbk]
> muskie-brick: no proper reply from server, returning ENOTCONN
> 2008-11-19 23:09:56 E [write-behind.c:602:wb_writev] wb: delayed error :
> 107
>
> or (caused by namespace)
> 2008-11-28 20:47:53 C [client-protocol.c:212:call_bail] muskie-ns:
> bailing transport
> 2008-11-28 20:47:53 E [client-protocol.c:4834:client_protocol_cleanup]
> muskie-ns: forced unwinding frame type(1) op(40) reply=@0x1b447cc0
> 2008-11-28 20:47:53 E [client-protocol.c:4613:client_checksum_cbk]
> muskie-ns: no proper reply from server, returning ENOTCONN
> 2008-11-28 20:47:53 E [client-protocol.c:325:client_protocol_xfer]
> muskie-ns: transport_submit failed
>
>
what is the transport timeout you are using? If the transport-timeout is
small and the server is busy serving other requests, there is a good
possibility that the operations are bailing out and resulting in ENOTCONN
errors.

Are you using io-threads on server side? Can you send the configuration
files?


>
> 2. Right now the process 'glusterfs' takes 1785M virt mem, and 1500 RES
> mem, according to top. I hope this is not a memory leak, or at least
> there should be a way to reduce memory usage without remounting it.
>
>
>
> If somebody can shed some light on these issues, I appreciate it. Just
> let me know if you need more detailed information.
>
>
> Best,
> Manhong
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081202/c52b6795/attachment.html>


More information about the Gluster-users mailing list