[Gluster-users] Failure: "Transport endpoint is not connected" (but it is!)
Benjamin Smith
lists at benjamindsmith.com
Tue Jan 20 00:27:16 UTC 2009
Late last week, I rolled out GlusterFS on our production cluster. Config is
very simple, two active servers that are also clients to each other. Usage is
for a fairly low-volume distribution of file settings for an application
cluster that are updated perhaps a few times per day and read constantly.
(pretty much every web page hit) Here are the numbers:
OS: CentOS 4, Linux 2.6.9-78.0.13.ELsmp
HW: Multicore Opteron, X86/64, 4 GB ECC RAM, SCSI, software RAID 1
Transport: GB ethernet
Fuse: 2.7.4-1 el4
dkms-fuse: 2.7.4-1
GlusterFS 1.3.12 (built as RPM from tarball)
Config: (At the bottom of this email)
Got a complaint today, "Servers down!". When I did a "df" to see what's going
on, I got a "Transport endpoint is not connected" message (or similar) next
to the GlusterFS client partition. Yet in all cases, I could ping/connect to
the "other" system, and both DNS servers were working fine.
More interested in restoring service than forensics, I did the following:
1) Shutdown gluster client, and started back up. Result? df command worked as
expected, but the files still could not be read.
2) Shutdown gluster client, gluster server, then restarted in reverse order.
Everything was then back up instantly.
glusterfs.log has about 3.5 million (no kidding!) entries, small sample below.
The only entries in glusterfsd.log are those of my resetting them. =/ Any
idea what causes this?
// GLUSTERFS.LOG
2009-01-19 14:12:17 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCONN
2009-01-19 14:12:17 W [client-protocol.c:332:client_protocol_xfer] remote2:
not connected at the moment to submit frame type(1) op(34)
2009-01-19 14:12:17 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCONN
2009-01-19 14:12:17 W [client-protocol.c:332:client_protocol_xfer] remote2:
not connected at the moment to submit frame type(1) op(34)
2009-01-19 14:12:17 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCONN
2009-01-19 14:12:17 W [client-protocol.c:332:client_protocol_xfer] remote2:
not connected at the moment to submit frame type(1) op(34)
2009-01-19 14:12:17 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCONN
2009-01-19 14:12:23 W [client-protocol.c:332:client_protocol_xfer] remote2:
not connected at the moment to submit frame type(1) op(34)
2009-01-19 14:12:23 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCONN
2009-01-19 14:12:23 W [client-protocol.c:332:client_protocol_xfer] remote2:
not connected at the moment to submit frame type(1) op(34)
2009-01-19 14:12:23 E [client-protocol.c:4430:client_lookup_cbk] remote2: no
proper reply from server, returning ENOTCON
-- SERVER FILE --
> Volume brick
> type storage/posix
> option directory /home/uroot/home/cworks/.data
> end-volume
> volume server
> type protocol/server
> subvolumes brick
> option transport-type tcp/server
> option auth.ip.brick.allow 192.168.254.*
> end-volume
-- CLIENT FILE --
> volume remote1
> type protocol/client
> option transport-type tcp/client
> option remote-host glusterfs1.spfs
> option remote-subvolume brick
> end-volume
> volume remote2
> type protocol/client
> option transport-type tcp/client
> option remote-host glusterfs2.spfs
> option remote-subvolume brick
> end-volume
> volume mirror0
> type cluster/afr
> subvolumes remote1 remote2
> end-volume
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Gluster-users
mailing list