[Gluster-users] server side afr hangs then crashes when other server rebooted
Keith Freedman
freedman at FreeFormIT.com
Sun Dec 14 08:28:21 UTC 2008
I just installed 1.4 rc2 tonight. I was running rc1 since it was
released -- this problem probably existed then, but I didn't test it..
I have 2 servers which AFR to eachother.
I rebooted one, and while that server was down, the other servers
gluster mount was hung.
once the other server came back up (was pingable on the network)..
the gluster process crashed
I remounted the filesystem and it began auto-healing.
heres' the log from the server that was not rebooted (the other one
doesn't have anything in the log other than the gluster startup stuff.
Version : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17
TLA Revision : glusterfs--mainline--3.0--patch-770
Starting Time: 2008-12-13 22:43:12
Command line : /usr/local/sbin/glusterfs --log-level=WARNING
--volfile=/etc/glus
terfs/glusterfs-home.vol /home
given volfile
+-----
1: ### file: server-volume.spec.sample
2:
3: ##############################################
4: ### GlusterFS Server Volume Specification ##
5: ##############################################
6:
7: #### CONFIG FILE RULES:
8: ### "#" is comment character.
9: ### - Config file is case sensitive
10: ### - Options within a volume block can be in any order.
11: ### - Spaces or tabs are used as delimitter within a line.
12: ### - Multiple values to options will be : delimitted.
13: ### - Each option should end within a line.
14: ### - Missing or commented fields will assume default values.
15: ### - Blank/commented lines are allowed.
16: ### - Sub-volumes should already be defined above before referring.
17:
18: ### Export volume "home1" with the contents of "/home/export" directory.
19: volume home1
20: type storage/posix # POSIX FS translator
21: option directory /gluster/home # Export this directory
22: end-volume
23:
24: volume posix-locks-home1
25: type features/posix-locks
26: option mandatory on
27: subvolumes home1
28: end-volume
29:
30: ## Reference volume "home2" from remote server
31: volume home2
32: type protocol/client # POSIX FS translator
33: option transport-type tcp/client
34: option remote-host 192.168.2.2 # IP address of remote host
35: option remote-subvolume posix-locks-home1 # use home1
on remote ho
st
36: option transport-timeout 10 # value in seconds; it
should be se
t relatively low
37: end-volume
38:
39: ### Add network serving capability to above home.
40: volume server
41: type protocol/server
42: option transport-type tcp/server # For TCP/IP transport
43: subvolumes posix-locks-home1
44: option auth.addr.posix-locks-home1.allow
192.168.2.2,127.0.0.1 # Allow a
ccess to "home1" volume
45: end-volume
46:
47: ### Create automatic file replication
48: volume home
49: type cluster/afr
50: option read-subvolume posix-locks-home1
51: subvolumes posix-locks-home1 home2
52: # subvolumes posix-locks-home1
53: end-volume
54:
55: #volume threads1
56: # type performance/io-threads
57: # option thread-count 2
58: # option cache-size 32MB
59: # subvolumes home
60: #end-volume
+-----
2008-12-13 22:47:53 W [afr-self-heal-common.c:985:afr_self_heal]
home: performin
g self heal on /ac/mail (metadata=0 data=0 entry=1)
2008-12-13 22:47:53 W
[afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home
: impunging entries of /ac/mail on posix-locks-home1 to other sinks
2008-12-13 22:47:53 W
[afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home:
expunging entries of /ac/mail on home2 to other sinks
2008-12-13 22:47:53 E [posix.c:1834:posix_release] home1: pfd->dir is 0x17e2ac0
(not NULL) for file fd=0x17e1850
2008-12-13 22:47:53 W [afr-self-heal-entry.c:70:afr_sh_entry_done]
home: self he
al of /ac/mail completed
2008-12-14 00:10:24 E [client-protocol.c:273:call_bail] home2:
activating bail-o
ut. pending frames = 45. last sent = 2008-12-14 00:10:08. last
received = 2008-1
2-14 00:07:57. transport-timeout = 10
2008-12-14 00:10:24 C [client-protocol.c:308:call_bail] home2:
bailing transport
2008-12-14 00:10:24 E
[client-protocol.c:5728:protocol_client_cleanup] home2: fo
rced unwinding frame type(3) op(RELEASE) reply=@0x17675d0
2008-12-14 00:10:24 E
[client-protocol.c:5712:protocol_client_cleanup] home2: fo
rced unwinding frame type(1) op(LOOKUP) reply=@0x17675d0
2008-12-14 00:10:24 E [socket.c:1189:socket_submit] home2: transport
not connect
ed to submit (priv->connected = 255)
2008-12-14 00:10:24 W [common-utils.c:156:gf_print_bytes] glusterfs: Total data
(in bytes): transfered (55132079), received (40565752)
pending frames:
frame : type(1) op(12)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
Signal received: 11
configuration details:argp 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0rc2
Version : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17
TLA Revision : glusterfs--mainline--3.0--patch-770
Starting Time: 2008-12-14 00:10:38
Command line : /usr/local/sbin/glusterfs --log-level=WARNING
--volfile=/etc/glus
terfs/glusterfs-home.vol /home
given volfile
+-----
1: ### file: server-volume.spec.sample
2:
3: ##############################################
4: ### GlusterFS Server Volume Specification ##
5: ##############################################
6:
7: #### CONFIG FILE RULES:
8: ### "#" is comment character.
9: ### - Config file is case sensitive
10: ### - Options within a volume block can be in any order.
11: ### - Spaces or tabs are used as delimitter within a line.
12: ### - Multiple values to options will be : delimitted.
13: ### - Each option should end within a line.
14: ### - Missing or commented fields will assume default values.
15: ### - Blank/commented lines are allowed.
16: ### - Sub-volumes should already be defined above before referring.
17:
18: ### Export volume "home1" with the contents of "/home/export" directory.
19: volume home1
20: type storage/posix # POSIX FS translator
21: option directory /gluster/home # Export this directory
22: end-volume
23:
24: volume posix-locks-home1
25: type features/posix-locks
26: option mandatory on
27: subvolumes home1
28: end-volume
29:
30: ## Reference volume "home2" from remote server
31: volume home2
32: type protocol/client # POSIX FS translator
33: option transport-type tcp/client
34: option remote-host 192.168.2.2 # IP address of remote host
35: option remote-subvolume posix-locks-home1 # use home1
on remote ho
st
36: option transport-timeout 10 # value in seconds; it
should be se
t relatively low
37: end-volume
38:
39: ### Add network serving capability to above home.
40: volume server
41: type protocol/server
42: option transport-type tcp/server # For TCP/IP transport
43: subvolumes posix-locks-home1
44: option auth.addr.posix-locks-home1.allow
192.168.2.2,127.0.0.1 # Allow a
ccess to "home1" volume
45: end-volume
46:
47: ### Create automatic file replication
48: volume home
49: type cluster/afr
50: option read-subvolume posix-locks-home1
51: subvolumes posix-locks-home1 home2
52: # subvolumes posix-locks-home1
53: end-volume
54:
55: #volume threads1
56: # type performance/io-threads
57: # option thread-count 2
58: # option cache-size 32MB
59: # subvolumes home
60: #end-volume
+-----
2008-12-14 00:10:39 E [socket.c:710:socket_connect_finish] home2:
connection fai
led (Connection refused)
2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2:
failed to get r
emote fd number for fd_t(0x1552360)
2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2:
failed to get re
mote fd from fd_t(0x1552360). returning EBADFD
2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2:
failed to get r
emote fd number for fd_t(0x1552360)
2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2:
failed to get re
mote fd from fd_t(0x1552360). returning EBADFD
2008-12-14 00:11:18 W [afr-self-heal-common.c:985:afr_self_heal]
home: performin
g self heal on /sharedtmp (metadata=0 data=0 entry=1)
2008-12-14 00:11:18 W
[afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home
: impunging entries of /sharedtmp on posix-locks-home1 to other sinks
2008-12-14 00:11:18 W
[afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home:
expunging entries of /sharedtmp on home2 to other sinks
2008-12-14 00:11:18 W [afr-self-heal-entry.c:70:afr_sh_entry_done]
home: self he
al of /sharedtmp completed
More information about the Gluster-users
mailing list