[Gluster-devel] gluster write million of lines: WRITE => -1 (Transport endpoint is not connected)

Tue Oct 28 09:00:42 UTC 2014

Hi Raghavendra and Ben,
thanks for your answers.
The volume is a backend of nova instances of Openstack infrastructure, 
and as wrote by Raghavendra not seems, but I'm sure the compute node has 
been writing to gluster volume after a potential network problem, but in 
our monitoring system we did not see the network problem and if there 
was it could be there just for a while.
So the timeline could be:
- nova write/read to/from volume volume-nova-pp
- network problem during 1 second
- report in gluster log of network ptoblem (first part of log):

[2014-10-10 07:29:43.730792] W [socket.c:522:__socket_rwv] 0-glusterfs: 
readv on 192.168.61.100:24007 <http://192.168.61.100:24007> failed (No 
data available)
[2014-10-10 07:29:54.022608] E [socket.c:2161:socket_connect_finish] 
0-glusterfs: connection to 192.168.61.100:24007 
<http://192.168.61.100:24007> failed (Connection refused)
[2014-10-10 07:30:05.271825] W 
[client-rpc-fops.c:866:client3_3_writev_cbk] 0-volume-nova-pp-client-0: 
remote operation failed: Input/output error
[2014-10-10 07:30:08.783145] W [fuse-bridge.c:2201:fuse_writev_cbk] 
0-glusterfs-fuse: 3661260: WRITE => -1 (Input/output error)

- nova write/read to/from volume volume-nova-pp
- second part of log million of lines like this:
[2014-10-15 14:41:15.895105] W [fuse-bridge.c:2201:fuse_writev_cbk] 
0-glusterfs-fuse: 951700230: WRITE => -1 (Transport endpoint is not 
connected)

For Ben:
I'm using gluster 3.5.2 not gluster 3.6, am I try to use the gluster 3,6?

It should be a very good things if in gluster will be e option to 
rate-limit a particular logging call or per unit of time or when the log 
size overtake a prefixed limit.

I think in this particular case the WARNING should be write 1 time for 
minute after the first 1000 similar lines.

Cheers
Sergio

On 10/27/2014 05:32 PM, Raghavendra G wrote:
> Seems like there were on-going write operations. On errors we log and 
> network disconnect has resulted in these logs.
>
> On Mon, Oct 27, 2014 at 7:21 PM, Sergio Traldi 
> <sergio.traldi at pd.infn.it <mailto:sergio.traldi at pd.infn.it>> wrote:
>
>     Hi all,
>     One server Redhat 6 with this rpms set:
>
>     [ ~]# rpm -qa | grep gluster | sort
>     glusterfs-3.5.2-1.el6.x86_64
>     glusterfs-api-3.5.2-1.el6.x86_64
>     glusterfs-cli-3.5.2-1.el6.x86_64
>     glusterfs-fuse-3.5.2-1.el6.x86_64
>     glusterfs-geo-replication-3.5.2-1.el6.x86_64
>     glusterfs-libs-3.5.2-1.el6.x86_64
>     glusterfs-server-3.5.2-1.el6.x86_64
>
>     I have a gluster volume with 1 server and 1 brick:
>
>     [ ~]# gluster volume info volume-nova-pp
>     Volume Name: volume-nova-pp
>     Type: Distribute
>     Volume ID: b5ec289b-9a54-4df1-9c21-52ca556aeead
>     Status: Started
>     Number of Bricks: 1
>     Transport-type: tcp
>     Bricks:
>     Brick1: 192.168.61.100:/brick-nova-pp/mpathc
>     Options Reconfigured:
>     storage.owner-gid: 162
>     storage.owner-uid: 162
>
>     There are four clients attached to this volume with same O.S. and
>     same fuse gluster rpms set:
>     [ ~]# rpm -qa | grep gluster | sort
>     glusterfs-3.5.0-2.el6.x86_64
>     glusterfs-api-3.5.0-2.el6.x86_64
>     glusterfs-fuse-3.5.0-2.el6.x86_64
>     glusterfs-libs-3.5.0-2.el6.x86_6
>
>     Last week, but it happens also two weeks ago, I found the disk
>     almost full and I found the gluster logs
>     /var/log/glusterfs/var-lib-nova-instances.log of 68GB:
>     In the log there was the starting problem:
>
>     [2014-10-10 07:29:43.730792] W [socket.c:522:__socket_rwv]
>     0-glusterfs: readv on 192.168.61.100:24007
>     <http://192.168.61.100:24007> failed (No data available)
>     [2014-10-10 07:29:54.022608] E
>     [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to
>     192.168.61.100:24007 <http://192.168.61.100:24007> failed
>     (Connection refused)
>     [2014-10-10 07:30:05.271825] W
>     [client-rpc-fops.c:866:client3_3_writev_cbk]
>     0-volume-nova-pp-client-0: remote operation failed: Input/output error
>     [2014-10-10 07:30:08.783145] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 3661260:
>     WRITE => -1 (Input/output error)
>     [2014-10-10 07:30:08.783368] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 3661262:
>     WRITE => -1 (Input/output error)
>     [2014-10-10 07:30:08.806553] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 3661649:
>     WRITE => -1 (Input/output error)
>     [2014-10-10 07:30:08.844415] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 3662235:
>     WRITE => -1 (Input/output error)
>
>     and a lot of these lines:
>
>     [2014-10-15 14:41:15.895105] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 951700230:
>     WRITE => -1 (Transport endpoint is not connected)
>     [2014-10-15 14:41:15.896205] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 951700232:
>     WRITE => -1 (Transport endpoint is not connected)
>
>     This second line log with different "sector" number has been
>     written every millisecond so in about 1 minute we have 1GB write
>     in O.S. disk.
>
>     I search for a solution but I didn't find nobody having the same
>     problem.
>
>     I think there was a network problem  but why does gluster write in
>     logs million of:
>     [2014-10-15 14:41:15.895105] W
>     [fuse-bridge.c:2201:fuse_writev_cbk] 0-glusterfs-fuse: 951700230:
>     WRITE => -1 (Transport endpoint is not connected) ?
>
>     Thanks in advance.
>     Cheers
>     Sergio
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> -- 
> Raghavendra G

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141028/248b0fde/attachment.html>