[Gluster-devel] errors in my log

m.c.wilkins at massey.ac.nz m.c.wilkins at massey.ac.nz
Tue Apr 21 08:36:59 UTC 2009




hi all,

i am having problems with rc7, i get "Transport endpoint not
connected" from time to time.  the problem is causing us a major
headache, basically makes our gluster unusable.

i posted this to the users list, but haven't got a response yet, so
hope you can help.

i'm wondering if i could just tweak some timeouts or something.  it is
really odd though, the machines are connected to the same gigabit
switch, and not heavily loaded (well i don't think so anyway - i
didn't check at the time).

---
hi,

i'm running 2.0.0rc7 (config below) in a nufa setup.  please help me
out, i'm getting some errors in my logs:

on tur-awc1 i have:

2009-04-21 01:24:43 E [client-protocol.c:533:client_ping_timer_expired] tur-awc3-0: ping timer expired! bailing transport
2009-04-21 01:24:43 E [saved-frames.c:169:saved_frames_unwind] tur-awc3-0: forced unwinding frame type(1) op(MKNOD)
2009-04-21 01:24:43 E [fuse-bridge.c:1274:fuse_rename_cbk] glusterfs-fuse: 102353757: /090417_HWI-EAS209_0011_FC30K/Data/IPAR_1.3/Bustard1.3.2_20-04-2009_mjscolli/GERALD_20-04-2009_mjscolli/s_5_0009_realign.txt.tmp -> /090417_HWI-EAS209_0011_FC30K/Data/IPAR_1.3/Bustard1.3.2_20-04-2009_mjscolli/GERALD_20-04-2009_mjscolli/s_5_0009_realign.txt => -1 (Transport endpoint is not connected)
2009-04-21 01:24:43 E [saved-frames.c:169:saved_frames_unwind] tur-awc3-0: forced unwinding frame type(1) op(STAT)
2009-04-21 01:24:43 E [dht-common.c:762:dht_attr_cbk] nufa: subvolume tur-awc3-0 returned -1 (Transport endpoint is not connected)
2009-04-21 01:24:43 E [saved-frames.c:169:saved_frames_unwind] tur-awc3-0: forced unwinding frame type(1) op(STAT)
2009-04-21 01:24:43 E [dht-common.c:762:dht_attr_cbk] nufa: subvolume tur-awc3-0 returned -1 (Transport endpoint is not connected)
2009-04-21 01:24:43 E [saved-frames.c:169:saved_frames_unwind] tur-awc3-0: forced unwinding frame type(1) op(STAT)
2009-04-21 01:24:43 E [dht-common.c:762:dht_attr_cbk] nufa: subvolume tur-awc3-0 returned -1 (Transport endpoint is not connected)
2009-04-21 01:24:43 E [saved-frames.c:169:saved_frames_unwind] tur-awc3-0: forced unwinding frame type(2) op((null))
2009-04-21 01:24:43 E [client-protocol.c:630:client_ping_cbk] tur-awc3-0: timer must have expired
2009-04-21 01:24:45 N [client-protocol.c:6159:client_setvolume_cbk] tur-awc3-0: connection and handshake succeeded

seems like it is having problems communicating with tur-awc3, and in
that machines log i have:

2009-04-21 01:24:45 N [server-protocol.c:7513:mop_setvolume] server: accepted client from 130.123.129.121:1013
2009-04-21 01:24:45 E [socket.c:102:__socket_rwv] server: writev failed (Broken pipe)
2009-04-21 01:24:45 N [server-protocol.c:8268:notify] server: 130.123.129.121:1011 disconnected
2009-04-21 04:03:12 W [nufa.c:219:nufa_lookup] nufa: incomplete layout failure for path=/
2009-04-21 04:03:12 W [fuse-bridge.c:301:need_fresh_lookup] fuse-bridge: revalidate of / failed (Resource temporarily unavailable)

any idea what is happening?  i can confirm that both machines are up
and under almost no load, connected on the same gigabit switch.
130.123.129.121 is the IP address of tur-awc1.

any help much appreciated

Matt

---

oh and my config is:

volume posix0
  type storage/posix
  option directory /export/brick-newgluster
end-volume

volume locks0
  type features/locks
  subvolumes posix0
end-volume

volume brick0
  type performance/io-threads
  subvolumes locks0
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option listen-port 16996
  option auth.addr.brick0.allow *
  subvolumes brick0
end-volume

volume tur-awc1-0
  type protocol/client
  option transport-type tcp
  option remote-port 16996
  option remote-host tur-awc1
  option remote-subvolume brick0
end-volume

volume tur-awc2-0
  type protocol/client
  option transport-type tcp
  option remote-port 16996
  option remote-host tur-awc2
  option remote-subvolume brick0
end-volume

volume tur-awc3-0
  type protocol/client
  option transport-type tcp
  option remote-port 16996
  option remote-host tur-awc3
  option remote-subvolume brick0
end-volume

volume nufa
   type cluster/nufa
   option local-volume-name `hostname`-0
   subvolumes tur-awc1-0 tur-awc2-0 tur-awc3-0
end-volume

Matt






More information about the Gluster-devel mailing list