[Gluster-devel] bug report: gluster blocks when server is down

Markus Fenske iblue at gmx.net
Tue Aug 26 01:33:18 UTC 2008


Hello,

It seems like I have found a bug.

I run 1.3.11, my configuration is:
---
volume server1
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.13.37.101
       option remote-subvolume brick1
end-volume

volume server2
       type protocol/client
       option transport-type tcp/client
       option remote-host 10.13.37.102
       option remote-subvolume brick2
end-volume

volume mirror0
       type cluster/afr
       subvolumes server1 server2
end-volume
---

When I am running one or two servers and mount the volume to i.e.
/cluster, and all servers go down, then I instantly get:
---
user at test2:~$ ls /cluster
ls: cannot access /cluster: Transport endpoint is not connected
---

The glusterfs.log says:
---
2008-08-26 02:07:44 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:07:44 W [client-protocol.c:332:client_protocol_xfer]
server1: not connected at the moment to submit frame type(1) op(34)
2008-08-26 02:07:44 E [client-protocol.c:4430:client_lookup_cbk]
server1: no proper reply from server, returning ENOTCONN
2008-08-26 02:07:44 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:07:44 W [client-protocol.c:332:client_protocol_xfer]
server2: not connected at the moment to submit frame type(1) op(34)
2008-08-26 02:07:44 E [client-protocol.c:4430:client_lookup_cbk]
server2: no proper reply from server, returning ENOTCONN
2008-08-26 02:07:44 E [fuse-bridge.c:468:fuse_entry_cbk]
glusterfs-fuse: 12: (34) / => -1 (107)
2008-08-26 02:07:44 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:07:44 W [client-protocol.c:332:client_protocol_xfer]
server1: not connected at the moment to submit frame type(1) op(34)
2008-08-26 02:07:44 E [client-protocol.c:4430:client_lookup_cbk]
server1: no proper reply from server, returning ENOTCONN
2008-08-26 02:07:44 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:07:44 W [client-protocol.c:332:client_protocol_xfer]
server2: not connected at the moment to submit frame type(1) op(34)
2008-08-26 02:07:44 E [client-protocol.c:4430:client_lookup_cbk]
server2: no proper reply from server, returning ENOTCONN
2008-08-26 02:07:44 E [fuse-bridge.c:468:fuse_entry_cbk]
glusterfs-fuse: 12: (34) / => -1 (107)
2008-08-26 02:07:51 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
---

But when I start and the server is already down, I can wait forever
for my ls to complete:
---
user at test2:~$ ls /cluster

---

With following glusterfs.log:
---
2008-08-26 02:08:11 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:11 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:12 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:12 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:14 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:14 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:17 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:17 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:22 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:22 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:30 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:30 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:43 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:08:43 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:09:04 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:09:04 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:09:38 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:09:38 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:10:33 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:10:33 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:12:02 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:12:02 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:14:26 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:14:27 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:18:20 E [tcp-client.c:190:tcp_connect] server2:
non-blocking connect() returned: 111 (Connection refused)
2008-08-26 02:18:20 E [tcp-client.c:190:tcp_connect] server1:
non-blocking connect() returned: 111 (Connection refused)
---

Thanks,
Markus Fenske





More information about the Gluster-devel mailing list