[Gluster-devel] gluster working, but error appearing every two seconds in logs - NEW INFO

Thu Feb 19 17:22:45 UTC 2009

En/na Jordi Moles Blanco ha escrit:
> En/na Anand Avati ha escrit:
>>> During several days i didn't pay attention to the gluster logs, as
>>> everything worked fine. However, today i decided i was moving a file 
>>> sized
>>> 500MB and the mount point got stale, i couln't access the data from 
>>> that
>>> particular client. The gluster itself didn't seem to be affected, nodes
>>> didn't report any problem at all in the log files and other clients 
>>> kept the
>>> mount point without any problem.
>>> Then i decided to have a look at the log files:
>>>
>>>
>>> *************
>>> 2009-01-30 11:00:41 W [client-protocol.c:332:client_protocol_xfer] 
>>> espai1:
>>> not connected at the moment to submit frame type(1) op(15)
>>> 2009-01-30 11:00:41 E [client-protocol.c:3891:client_statfs_cbk] 
>>> espai1: no
>>> proper reply from server, returning ENOTCONN
>>> 2009-01-30 11:00:41 E [tcp-client.c:190:tcp_connect] espai5: 
>>> non-blocking
>>> connect() returned: 111 (Connection refused)
>>>
>>> 2009-01-30 11:00:43 W [client-protocol.c:332:client_protocol_xfer] 
>>> espai2:
>>> not connected at the moment to submit frame type(1) op(15)
>>> 2009-01-30 11:00:43 E [client-protocol.c:3891:client_statfs_cbk] 
>>> espai2: no
>>> proper reply from server, returning ENOTCONN
>>> 2009-01-30 11:00:43 E [tcp-client.c:190:tcp_connect] espai6: 
>>> non-blocking
>>> connect() returned: 111 (Connection refused)
>>> *************
>>>     
>>
>> A connection refused error is got when a daemon is not running, or if
>> there is a packet filter resetting connections. If GlusterFS daemon is
>> running and other clients are able to access normally, please make
>> sure there is no packet filtering of some sort happening. You can try
>> flushing all firewall rules if there were any. Based on the
>> description you give, it seems to be an issue outside GlusterFS
>>
>> Avati
>>   
> Hi,
>
> thanks for the explanation about the origin of the error message.
>
> Well... it doesn't look like there is a problem with the network on 
> which glusterfs runs, it would have appeared in the rrd graphs i'm 
> keeping for net traffic, but i'll carry a whole test to see if there's 
> the slightest problem which could generate this message.
>
>
> Thanks.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel

Hi,

since the last time we were in contact I've been trying to track down 
where the problem is. I've been monitoring almost every possible thing 
related to network traffic, and... eventually.... i found out what the 
problem is by chance!!

It turns out that when in a client-server mounting gluster i run "df 
-h", i get this:

***********
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai1: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai1: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai1: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai5: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai5: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai5: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai2: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai2: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai2: 
no proper reply from server, returning ENOTCONN
2009-02-19 17:15:43 E [tcp-client.c:190:tcp_connect] espai6: 
non-blocking connect() returned: 111 (Connection refused)
2009-02-19 17:15:43 W [client-protocol.c:332:client_protocol_xfer] 
espai6: not connected at the moment to submit frame type(1) op(15)
2009-02-19 17:15:43 E [client-protocol.c:3891:client_statfs_cbk] espai6: 
no proper reply from server, returning ENOTCONN

************

so... the reason why it is appearing so often is that i've got munin 
monitoring this gluster environment, and it performs a "df" command to 
check the disk space of all the servers, including, of course, the 
gluster mount point. When this happens... the error log shown above 
these lines is reported and eventually.... the mount point in that 
server fails. No data is lost, but i have to remount glusterfs as it 
becomes stale and data is not accessible.

is this a normal behaviour?

i could stop munin from running "df" every 5 minutes... but still... is 
there any problem in my setup or is this what gluster is supposed to do?

Thanks.