[Gluster-devel] gluster working, but error appearing every two seconds in logs

Fri Jan 30 10:51:13 UTC 2009

Hello everyone,

I'm using gluster 1.3.x version, patch 800 from tla repositories.

The thing is that i have 6 nodes, providing a total amount of 2TB and several clients accessing the data constantly and putting gluster under a lot of work.

During several days i didn't pay attention to the gluster logs, as everything worked fine. However, today i decided i was moving a file sized 500MB and the mount point got stale, i couln't access the data from that particular client. The gluster itself didn't seem to be affected, nodes didn't report any problem at all in the log files and other clients kept the mount point without any problem.
Then i decided to have a look at the log files:

*************
2009-01-30 11:00:41 W [client-protocol.c:332:client_protocol_xfer] espai1: not connected at the moment to submit frame type(1) op(15)
2009-01-30 11:00:41 E [client-protocol.c:3891:client_statfs_cbk] espai1: no proper reply from server, returning ENOTCONN
2009-01-30 11:00:41 E [tcp-client.c:190:tcp_connect] espai5: non-blocking connect() returned: 111 (Connection refused)

2009-01-30 11:00:43 W [client-protocol.c:332:client_protocol_xfer] espai2: not connected at the moment to submit frame type(1) op(15)
2009-01-30 11:00:43 E [client-protocol.c:3891:client_statfs_cbk] espai2: no proper reply from server, returning ENOTCONN
2009-01-30 11:00:43 E [tcp-client.c:190:tcp_connect] espai6: non-blocking connect() returned: 111 (Connection refused)
*************

This goes on and on for days, and it prints some error message every 2-3 seconds.

Is there any major bug in the version i'm using? Is there any way to fix this?

If i look through the whole logfile, i can't see any message (a part from this one that repeats every 2-3 seconds) which indicates why the mountpoint got stale and the data not accessible from that client.

Does this error message have anything to do with today's issue? Could that message cause a failure in the system when moving, deleting or creating large files?

This are the config files:

NODE:

***********

volume esp
        type storage/posix
        option directory /glu0/data
end-volume

volume espai
        type performance/io-threads
        option thread-count 15
        option cache-size 512MB
        subvolumes esp
end-volume

volume nm
        type storage/posix
        option directory /glu0/ns
end-volume

volume ultim
    type protocol/server
    subvolumes espai nm
    option transport-type tcp/server
    option auth.ip.espai.allow *
    option auth.ip.nm.allow *
end-volume

***********

CLIENT:

********
volume espai1
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.3
        option remote-subvolume espai
end-volume

volume espai2
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.4
        option remote-subvolume espai
end-volume

volume espai3
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.5
        option remote-subvolume espai
end-volume

volume espai4
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.0.0.6
    option remote-subvolume espai
end-volume

volume espai5
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.0.0.7
    option remote-subvolume espai
end-volume

volume espai6
    type protocol/client
    option transport-type tcp/client
    option remote-host 10.0.0.8
    option remote-subvolume espai
end-volume

volume namespace1
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.3
        option remote-subvolume nm
end-volume

volume namespace2
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.4
        option remote-subvolume nm
end-volume

volume namespace3
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.5
        option remote-subvolume nm
end-volume

volume namespace4
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.6
        option remote-subvolume nm
end-volume

volume namespace5
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.7
        option remote-subvolume nm
end-volume

volume namespace6
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.0.0.8
        option remote-subvolume nm
end-volume

volume grup1
        type cluster/afr
        subvolumes espai1 espai3 espai5
end-volume

volume grup2
        type cluster/afr
        subvolumes espai2 espai4 espai6
end-volume

volume nm
        type cluster/afr
        subvolumes namespace1 namespace2 namespace3 namespace4 namespace5 namespace6
end-volume

volume g01
        type cluster/unify
        subvolumes grup1 grup2
        option scheduler rr
        option namespace nm
end-volume

volume io-cache 
        type performance/io-cache 
        option cache-size 512MB 
        option page-size 1MB
        option force-revalidate-timeout 2 
        subvolumes g01
end-volume   

************

Thanks.