[Gluster-devel] AFR fails to provide High Availability

Ricardo Garcia Mayoral ricardo at torroja.dmt.upm.es
Mon Jun 9 14:25:42 UTC 2008


Hello everyone,

we are experiencing the following problem in our hpc cluster, with a 
gluster filesystem built using unify over afr on several couples of nodes:
Every once in a while, one of our nodes freezes; it will reply to 
'ping', but it will not allow ssh connections, nor direct terminal 
access. Under those circumstances, the gluster filesystem will crash, 
while if the frozen node is shut down, the filesystem will work (in 
deprecated mode). Since most of our running jobs use glusterfs to write 
in, this is becoming a quite serious problem.
Our server spec files look like:

-------------------------------------------------------------------
### GlusterFS Server Volume Specification

### Export volume brick.
volume brick
       type storage/posix
       option directory /state/partition1/glfsdir/
end-volume

### Add network serving capability to brick.
volume server
       type protocol/server
       option transport-type tcp/server
       option listen-port 6996
       subvolumes brick
       option auth.ip.brick.allow 10.*.*.*
end-volume
-------------------------------------------------------------------

and our client spec files look like:

-------------------------------------------------------------------
### GlusterFS Client Volume Specification

### Add client feature and attach to remote subvolume of server
volume brick0-0
  type protocol/client
  option transport-type tcp/client
  option remote-host compute-0-0
  option remote-subvolume brick
end-volume

[... several of those, up to compute 7-5 ...]

volume brick7-5
  type protocol/client
  option transport-type tcp/client
  option remote-host compute-7-5
  option remote-subvolume brick
end-volume

###  Namespace brick
volume local-ns
    type protocol/client
    option transport-type tcp/client
    option remote-host vulcano
    option remote-subvolume brick-ns
end-volume

###  Automatic File Replication
volume afr1
    type cluster/afr
    subvolumes brick0-0 brick4-0
end-volume

[... several of those, up to afr24 ...]

volume afr24
    type cluster/afr
    subvolumes brick3-5 brick7-5
end-volume

###  Unify
volume unify
    type cluster/unify
    subvolumes  afr1 afr2 afr3 afr4 afr5 afr6 afr7 afr8 afr9 afr10 afr11 
afr12 afr13 afr14 afr15 afr16 afr17 afr18 afr19 afr20 afr21 afr22 afr23 
afr24
    option namespace local-ns
# ALU scheduler
    option scheduler alu            # use the ALU scheduler
    option alu.limits.min-free-disk  5%    # Don't create files on a 
volume with less than 5% free diskspace
##   When deciding where to place a file, first look at the write-usage, 
then at
##   read-usage, disk-usage, open files, and finally the disk-speed-usage.
    option alu.order 
write-usage:read-usage:disk-usage:open-files-usage:disk-speed-usage
    option alu.write-usage.entry-threshold 20%   # Kick in when the 
write-usage discrepancy is 20%
    option alu.write-usage.exit-threshold  15%   # Don't stop until the 
discrepancy has been reduced to 5%
    option alu.read-usage.entry-threshold  20%   # Kick in when the 
read-usage discrepancy is 20%
    option alu.read-usage.exit-threshold    4%   # Don't stop until the 
discrepancy has been reduced to 16% (20% - 4%)
    option alu.disk-usage.entry-threshold 10GB   # Kick in if the 
discrep. in disk-usage between volumes is more than 10GB
    option alu.disk-usage.exit-threshold   1GB   # Don't stop writing to 
the least-used volume until the discrep. is 9GB
    option alu.open-files-usage.entry-threshold 1024   # Kick in if the 
discrepancy in open files is 1024
    option alu.open-files-usage.exit-threshold    32   # Stop when 992 
files have been written in the least-used vol.
#    option alu.disk-speed-usage.entry-threshold  # NEVER SET IT. SPEED 
IS CONSTANT!!!
#    option alu.disk-speed-usage.exit-threshold   # NEVER SET IT. SPEED 
IS CONSTANT!!!
    option alu.stat-refresh.interval 10sec   # Refresh the statistics 
used for decision-making every 10 seconds
#    option alu.stat-refresh.num-file-create 10   # Refresh the 
statistics used for decision-making after creating 10 files
## NUFA scheduler
#    option scheduler nufa
#    option nufa.local-volume-name afr24
end-volume
-------------------------------------------------------------------

The namespace is provided by the frontend 'vulcano' which does not 
otherwise contribute to the filesystem. The scheduler is NUFA for the 
nodes and ALU for the frontend. We have lately added         'option 
self-heal on' to the afr bricks and 'option transport-timeout 10' to the 
basic node bricks, 'brickX-X', but that had no effect on our problem.

What we get in /var/log/glusterfs/glusterfs.log is always something like 
this (with two node freeze examples):

-------------------------------------------------------------------
2008-06-06 20:54:41 W [client-protocol.c:204:call_bail] brick6-0: 
activating bail-out. pending frames = 4. last sent = 2008-06-06 
20:51:51. last received = 2008-06-06 20:43:08 transport-timeout = 108
2008-06-06 20:54:41 C [client-protocol.c:211:call_bail] brick6-0: 
bailing transport
2008-06-06 20:54:41 W [client-protocol.c:4759:client_protocol_cleanup] 
brick6-0: cleaning up state in transport object 0x554c10
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-0: forced unwinding frame type(1) op(34) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:4405:client_lookup_cbk] 
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-0: forced unwinding frame type(1) op(15) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:3866:client_statfs_cbk] 
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-0: forced unwinding frame type(1) op(34) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:4405:client_lookup_cbk] 
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 20:54:41 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-0: forced unwinding frame type(1) op(15) reply=@0x2a966a6880
2008-06-06 20:54:41 E [client-protocol.c:3866:client_statfs_cbk] 
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 21:00:10 W [client-protocol.c:279:client_protocol_xfer] 
brick6-0: attempting to pipeline request type(1) op(35) with handshake
2008-06-06 21:00:10 W [client-protocol.c:4759:client_protocol_cleanup] 
brick6-0: cleaning up state in transport object 0x554c10
2008-06-06 21:00:10 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-0: forced unwinding frame type(1) op(35) reply=@0x2a9622d710
2008-06-06 21:00:10 E [tcp-client.c:190:tcp_connect] brick6-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:00:10 W [client-protocol.c:331:client_protocol_xfer] 
brick6-0: not connected at the moment to submit frame type(1) op(35)
2008-06-06 21:00:57 E [tcp-client.c:190:tcp_connect] brick6-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:02:32 E [tcp-client.c:190:tcp_connect] brick6-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:02:32 W [client-protocol.c:331:client_protocol_xfer] 
brick6-0: not connected at the moment to submit frame type(1) op(15)
2008-06-06 21:02:32 E [client-protocol.c:3866:client_statfs_cbk] 
brick6-0: no proper reply from server, returning ENOTCONN
2008-06-06 21:04:24 E [protocol.c:271:gf_block_unserialize_transport] 
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:04:24 W [client-protocol.c:4759:client_protocol_cleanup] 
local-ns: cleaning up state in transport object 0x58b580
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport] 
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:10:24 W [client-protocol.c:4759:client_protocol_cleanup] 
local-ns: cleaning up state in transport object 0x58e980
2008-06-09 11:11:26 W [client-protocol.c:204:call_bail] brick6-5: 
activating bail-out. pending frames = 1. last sent = 2008-06-09 
11:11:06. last received = 2008-06-09 04:02:03 transport-timeout = 10
2008-06-09 11:11:26 C [client-protocol.c:211:call_bail] brick6-5: 
bailing transport
2008-06-09 11:11:26 W [client-protocol.c:4759:client_protocol_cleanup] 
brick6-5: cleaning up state in transport object 0x56eef0
2008-06-09 11:11:26 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-5: forced unwinding frame type(1) op(15) reply=@0x2a96205fd0
2008-06-09 11:11:26 E [client-protocol.c:3866:client_statfs_cbk] 
brick6-5: no proper reply from server, returning ENOTCONN
2008-06-09 13:32:52 W [client-protocol.c:279:client_protocol_xfer] 
brick6-5: attempting to pipeline request type(1) op(15) with handshake
2008-06-09 13:33:09 W [client-protocol.c:204:call_bail] brick6-5: 
activating bail-out. pending frames = 1. last sent = 2008-06-09 
13:32:52. last received = 1970-01-01 01:00:00 transport-timeout = 10
2008-06-09 13:33:09 C [client-protocol.c:211:call_bail] brick6-5: 
bailing transport
2008-06-09 13:33:09 W [client-protocol.c:4759:client_protocol_cleanup] 
brick6-5: cleaning up state in transport object 0x56eef0
2008-06-09 13:33:09 E [client-protocol.c:4809:client_protocol_cleanup] 
brick6-5: forced unwinding frame type(1) op(15) reply=@0x2a962009e0
2008-06-09 13:33:09 E [fuse-bridge.c:2487:fuse_thread_proc] 
glusterfs-fuse: fuse_chan_receive() returned -1 (25)
2008-06-09 13:33:09 E [client-protocol.c:3866:client_statfs_cbk] 
brick6-5: no proper reply from server, returning ENOTCONN
-------------------------------------------------------------------

In the frozen node, after rebooting, /var/log/glusterfs/glusterfsd.log 
states (for the brick6-0 case):

-------------------------------------------------------------------
2008-06-06 21:10:02 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.205:901)
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.1.1.1:995)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.252:997)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.251:997)
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.250:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.249:996)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.248:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.247:997)
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.246:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.245:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.244:997)
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.243:997)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.242:997)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.241:996)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.240:997)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.239:996)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.238:997)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.237:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.236:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.235:997)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.234:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.233:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.232:997)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.231:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.230:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.229:997)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.228:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.227:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.226:997)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.225:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.224:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.223:997)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.222:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.221:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.220:997)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.219:996)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.218:997)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.217:996)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (10.255.255.216:997)
-------------------------------------------------------------------

and the client log says:

-------------------------------------------------------------------
2008-06-06 21:10:24 E [protocol.c:271:gf_block_unserialize_transport] 
local-ns: EOF from peer (10.1.1.1:6996)
2008-06-06 21:10:24 W [client-protocol.c:4759:client_protocol_cleanup] 
local-ns: cleaning up state in transport object 0x58d7f0
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-0: EOF from peer (10.255.255.252:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-0: cleaning up state in transport object 0x51dc80
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-1: EOF from peer (10.255.255.251:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-1: cleaning up state in transport object 0x522b80
2008-06-06 21:10:25 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-2: EOF from peer (10.255.255.250:6996)
2008-06-06 21:10:25 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-2: cleaning up state in transport object 0x5274e0
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-3: EOF from peer (10.255.255.249:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-3: cleaning up state in transport object 0x52be40
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-4: EOF from peer (10.255.255.248:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-4: cleaning up state in transport object 0x5307a0
2008-06-06 21:10:26 E [protocol.c:271:gf_block_unserialize_transport] 
brick0-5: EOF from peer (10.255.255.247:6996)
2008-06-06 21:10:26 W [client-protocol.c:4759:client_protocol_cleanup] 
brick0-5: cleaning up state in transport object 0x535100
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-0: EOF from peer (10.255.255.246:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-0: cleaning up state in transport object 0x539a60
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-1: EOF from peer (10.255.255.245:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-1: cleaning up state in transport object 0x53e3c0
2008-06-06 21:10:27 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-2: EOF from peer (10.255.255.244:6996)
2008-06-06 21:10:27 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-2: cleaning up state in transport object 0x542d20
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-3: EOF from peer (10.255.255.243:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-3: cleaning up state in transport object 0x547680
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] local-ns: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-4: EOF from peer (10.255.255.242:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-4: cleaning up state in transport object 0x54bfe0
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick1-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick0-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
brick1-5: EOF from peer (10.255.255.241:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup] 
brick1-5: cleaning up state in transport object 0x550940
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick0-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [tcp-client.c:190:tcp_connect] brick1-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:28 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-0: EOF from peer (10.255.255.240:6996)
2008-06-06 21:10:28 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-0: cleaning up state in transport object 0x5552a0
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-1: EOF from peer (10.255.255.239:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-1: cleaning up state in transport object 0x559c00
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick1-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-2: EOF from peer (10.255.255.238:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-2: cleaning up state in transport object 0x55e560
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick2-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick0-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [tcp-client.c:190:tcp_connect] brick1-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:29 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-3: EOF from peer (10.255.255.237:6996)
2008-06-06 21:10:29 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-3: cleaning up state in transport object 0x562ec0
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick0-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-4: EOF from peer (10.255.255.236:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-4: cleaning up state in transport object 0x567820
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
brick2-5: EOF from peer (10.255.255.235:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup] 
brick2-5: cleaning up state in transport object 0x56c180
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick2-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-0: EOF from peer (10.255.255.234:6996)
2008-06-06 21:10:30 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-0: cleaning up state in transport object 0x570ae0
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick1-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:30 E [tcp-client.c:190:tcp_connect] brick3-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-1: EOF from peer (10.255.255.233:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-1: cleaning up state in transport object 0x575440
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] local-ns: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick0-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-2: EOF from peer (10.255.255.232:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-2: cleaning up state in transport object 0x579da0
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick2-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick0-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick1-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-3: EOF from peer (10.255.255.231:6996)
2008-06-06 21:10:31 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-3: cleaning up state in transport object 0x57e700
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:31 E [tcp-client.c:190:tcp_connect] brick3-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-4: EOF from peer (10.255.255.230:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-4: cleaning up state in transport object 0x583060
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
brick3-5: EOF from peer (10.255.255.229:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup] 
brick3-5: cleaning up state in transport object 0x5879c0
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick3-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick0-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick2-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:32 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-0: EOF from peer (10.255.255.228:6996)
2008-06-06 21:10:32 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-0: cleaning up state in transport object 0x520680
2008-06-06 21:10:32 E [tcp-client.c:190:tcp_connect] brick4-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick0-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-1: EOF from peer (10.255.255.227:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-1: cleaning up state in transport object 0x525000
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick1-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-2: EOF from peer (10.255.255.226:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-2: cleaning up state in transport object 0x529960
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick3-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick1-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick2-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-3: EOF from peer (10.255.255.225:6996)
2008-06-06 21:10:33 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-3: cleaning up state in transport object 0x52e2c0
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:33 E [tcp-client.c:190:tcp_connect] brick4-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-4: EOF from peer (10.255.255.224:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-4: cleaning up state in transport object 0x532c20
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
brick4-5: EOF from peer (10.255.255.223:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup] 
brick4-5: cleaning up state in transport object 0x537580
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick3-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-0: EOF from peer (10.255.255.222:6996)
2008-06-06 21:10:34 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-0: cleaning up state in transport object 0x53bee0
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick1-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick5-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:34 E [tcp-client.c:190:tcp_connect] brick4-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-1: EOF from peer (10.255.255.221:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-1: cleaning up state in transport object 0x540840
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-2: EOF from peer (10.255.255.220:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-2: cleaning up state in transport object 0x5451a0
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick3-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-3: EOF from peer (10.255.255.219:6996)
2008-06-06 21:10:35 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-3: cleaning up state in transport object 0x549b00
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick2-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick5-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:35 E [tcp-client.c:190:tcp_connect] brick4-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-4: EOF from peer (10.255.255.218:6996)
2008-06-06 21:10:36 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-4: cleaning up state in transport object 0x54e460
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick2-3: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick4-1: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] local-ns: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [protocol.c:271:gf_block_unserialize_transport] 
brick5-5: EOF from peer (10.255.255.217:6996)
2008-06-06 21:10:36 W [client-protocol.c:4759:client_protocol_cleanup] 
brick5-5: cleaning up state in transport object 0x552dc0
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-5: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick2-4: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick0-0: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick5-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:10:36 E [tcp-client.c:190:tcp_connect] brick4-2: 
non-blocking connect() returned: 111 (Connection refused)
2008-06-06 21:11:00 W [nufa.c:47:nufa_init] nufa: No option for limit 
min-free-disk given, defaulting it to 15
2008-06-06 21:11:00 W [nufa.c:55:nufa_init] nufa: No option for 
nufa.refresh-interval given, defaulting it to 30
-------------------------------------------------------------------

It seems to us that gluster still thinks that the frozen node is alive, 
at least to some extent, so it does not disregard it as part of the 
filesystem. Any ideas on what is happening, and how could we overcome 
it? Thanks in advance,

Ricardo Garcia Mayoral
Computational Fluid Mechanics
ETSI Aeronauticos, Universidad Politecnica de Madrid
Pz Cardenal Cisneros 3, 28040 Madrid, Spain.
Phone: (+34) 913363291  Fax: (+34) 913363295
e-mail: ricardo at torroja.dmt.upm.es






More information about the Gluster-devel mailing list