[Gluster-devel] GlusterFS Volume Failure
Philippe Muller
philippe.muller at gmail.com
Tue Jun 1 08:39:09 UTC 2010
Hi,
Last night, we got some troubles with a GlusterFS mount. It's a replicate
volume, and the 10.1.1.2 host was already down. The volume files weren't
readable until I manually restarted the GlusterFS instance.
We'd like to understand what happened on this volume. Especially the
"Server 10.1.1.1:6996 has not responded in the last 42 seconds,
disconnecting." message. I can't figure out why the GlusterFS instance
couldn't talk to itself.
Please help us.
This log is from 10.1.1.1 itself :
[2010-06-01 00:01:54] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:04:28] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:06:57] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:09:32] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:11:55] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:14:29] E [client-protocol.c:415:client_ping_timer_expired]
brick-qmaster: Server 10.1.1.1:6996 has not responded in the last 42
seconds, disconnecting.
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame STAT(0) frame sent = 2010-05-31 23:45:43. frame-timeout =
1800
[2010-06-01 00:15:44] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731899: STAT() /masterspool => -1 (Transport endpoint is not connected)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame LOOKUP(27) frame sent = 2010-05-31 23:45:39. frame-timeout
= 1800
[2010-06-01 00:15:44] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731898: LOOKUP() / => -1 (Transport endpoint is not connected)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame STATFS(13) frame sent = 2010-05-31 23:45:39. frame-timeout
= 1800
[2010-06-01 00:15:44] W [fuse-bridge.c:2352:fuse_statfs_cbk] glusterfs-fuse:
7731897: ERR => -1 (Transport endpoint is not connected)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame LOOKUP(27) frame sent = 2010-05-31 23:45:37. frame-timeout
= 1800
[2010-06-01 00:15:44] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731896: LOOKUP() / => -1 (Transport endpoint is not connected)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame OPEN(10) frame sent = 2010-05-31 23:45:34. frame-timeout =
1800
[2010-06-01 00:15:44] W [fuse-bridge.c:858:fuse_fd_cbk] glusterfs-fuse:
7731894: OPEN() /cell/common/bootstrap => -1 (Transport endpoint is not
connected)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame FSTAT(25) frame sent = 2010-05-31 23:45:35. frame-timeout
= 1800
[2010-06-01 00:15:44] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731895: FSTAT() /masterspool/messages => -1 (File descriptor in bad state)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame FSTAT(25) frame sent = 2010-05-31 23:45:34. frame-timeout
= 1800
[2010-06-01 00:15:44] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731893: FSTAT() /cell/common/bootstrap => -1 (File descriptor in bad state)
[2010-06-01 00:15:44] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame PING(5) frame sent = 2010-05-31 23:45:35. frame-timeout =
1800
[2010-06-01 00:15:54] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame PING(5) frame sent = 2010-05-31 23:45:51. frame-timeout =
1800
[2010-06-01 00:16:05] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame LOOKUP(27) frame sent = 2010-05-31 23:45:56. frame-timeout
= 1800
[2010-06-01 00:16:05] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
7731901: LOOKUP() / => -1 (Transport endpoint is not connected)
[2010-06-01 00:16:25] E [client-protocol.c:313:call_bail] brick-qmaster:
bailing out frame STATFS(13) frame sent = 2010-05-31 23:46:19. frame-timeout
= 1800
[2010-06-01 00:16:25] W [fuse-bridge.c:2352:fuse_statfs_cbk] glusterfs-fuse:
7731902: ERR => -1 (Transport endpoint is not connected)
[..]
Here is our configuration :
volume posix
type storage/posix
option directory /data/sge
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow 10.*.*.*
subvolumes brick
end-volume
volume brick-qmaster
type protocol/client
option transport-type tcp
option remote-host 10.1.1.1
option remote-subvolume brick
end-volume
volume brick-shadow
type protocol/client
option transport-type tcp
option remote-host 10.1.1.2
option remote-subvolume brick
end-volume
volume sge-replicate
type cluster/replicate
subvolumes brick-qmaster brick-shadow
end-volume
Philippe Muller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20100601/d6382b4c/attachment-0003.html>
More information about the Gluster-devel
mailing list