[Gluster-users] server side afr hangs then crashes when other server rebooted

Keith Freedman freedman at FreeFormIT.com
Sun Dec 14 08:28:21 UTC 2008


I just installed 1.4 rc2 tonight.  I was running rc1 since it was 
released -- this problem probably existed then, but I didn't test it..

I have 2 servers which AFR to eachother.

I rebooted one, and while that server was down, the other servers 
gluster mount was hung.
once the other server came back up (was pingable on the network).. 
the gluster process crashed

I remounted the filesystem and it began auto-healing.

heres' the log from the server that was not rebooted (the other one 
doesn't have anything in the log other than the gluster startup stuff.

Version      : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17
TLA Revision : glusterfs--mainline--3.0--patch-770
Starting Time: 2008-12-13 22:43:12
Command line : /usr/local/sbin/glusterfs --log-level=WARNING 
--volfile=/etc/glus
terfs/glusterfs-home.vol /home
given volfile
+-----
   1: ### file: server-volume.spec.sample
   2:
   3: ##############################################
   4: ###  GlusterFS Server Volume Specification  ##
   5: ##############################################
   6:
   7: #### CONFIG FILE RULES:
   8: ### "#" is comment character.
   9: ### - Config file is case sensitive
  10: ### - Options within a volume block can be in any order.
  11: ### - Spaces or tabs are used as delimitter within a line.
  12: ### - Multiple values to options will be : delimitted.
  13: ### - Each option should end within a line.
  14: ### - Missing or commented fields will assume default values.
  15: ### - Blank/commented lines are allowed.
  16: ### - Sub-volumes should already be defined above before referring.
  17:
  18: ### Export volume "home1" with the contents of "/home/export" directory.
  19: volume home1
  20:   type storage/posix                   # POSIX FS translator
  21:   option directory /gluster/home        # Export this directory
  22: end-volume
  23:
  24: volume posix-locks-home1
  25:   type features/posix-locks
  26:   option mandatory on
  27:   subvolumes home1
  28: end-volume
  29:
  30: ## Reference volume "home2" from remote server
  31: volume home2
  32:   type protocol/client                   # POSIX FS translator
  33:   option transport-type tcp/client
  34:   option remote-host 192.168.2.2  # IP address of remote host
  35:   option remote-subvolume posix-locks-home1        # use home1 
on remote ho
st
  36:   option transport-timeout 10           # value in seconds; it 
should be se
t relatively low
  37: end-volume
  38:
  39: ### Add network serving capability to above home.
  40: volume server
  41:   type protocol/server
  42:   option transport-type tcp/server     # For TCP/IP transport
  43:   subvolumes posix-locks-home1
  44:   option auth.addr.posix-locks-home1.allow 
192.168.2.2,127.0.0.1 # Allow a
ccess to "home1" volume
  45: end-volume
  46:
  47: ### Create automatic file replication
  48: volume home
  49:   type cluster/afr
  50:   option read-subvolume posix-locks-home1
  51:   subvolumes posix-locks-home1 home2
  52: #  subvolumes posix-locks-home1
  53: end-volume
  54:
  55: #volume threads1
  56: #  type performance/io-threads
  57: #  option thread-count 2
  58: #  option cache-size 32MB
  59: #  subvolumes home
  60: #end-volume
+-----
2008-12-13 22:47:53 W [afr-self-heal-common.c:985:afr_self_heal] 
home: performin
g self heal on /ac/mail (metadata=0 data=0 entry=1)
2008-12-13 22:47:53 W 
[afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home
: impunging entries of /ac/mail on posix-locks-home1 to other sinks
2008-12-13 22:47:53 W 
[afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home:
  expunging entries of /ac/mail on home2 to other sinks
2008-12-13 22:47:53 E [posix.c:1834:posix_release] home1: pfd->dir is 0x17e2ac0
(not NULL) for file fd=0x17e1850
2008-12-13 22:47:53 W [afr-self-heal-entry.c:70:afr_sh_entry_done] 
home: self he
al of /ac/mail completed
2008-12-14 00:10:24 E [client-protocol.c:273:call_bail] home2: 
activating bail-o
ut. pending frames = 45. last sent = 2008-12-14 00:10:08. last 
received = 2008-1
2-14 00:07:57. transport-timeout = 10
2008-12-14 00:10:24 C [client-protocol.c:308:call_bail] home2: 
bailing transport
2008-12-14 00:10:24 E 
[client-protocol.c:5728:protocol_client_cleanup] home2: fo
rced unwinding frame type(3) op(RELEASE) reply=@0x17675d0
2008-12-14 00:10:24 E 
[client-protocol.c:5712:protocol_client_cleanup] home2: fo
rced unwinding frame type(1) op(LOOKUP) reply=@0x17675d0
2008-12-14 00:10:24 E [socket.c:1189:socket_submit] home2: transport 
not connect
ed to submit (priv->connected = 255)
2008-12-14 00:10:24 W [common-utils.c:156:gf_print_bytes] glusterfs: Total data
(in bytes): transfered (55132079), received (40565752)
pending frames:
frame : type(1) op(12)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)
frame : type(1) op(32)

Signal received: 11
configuration details:argp 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0rc2

Version      : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17
TLA Revision : glusterfs--mainline--3.0--patch-770
Starting Time: 2008-12-14 00:10:38
Command line : /usr/local/sbin/glusterfs --log-level=WARNING 
--volfile=/etc/glus
terfs/glusterfs-home.vol /home
given volfile
+-----
   1: ### file: server-volume.spec.sample
   2:
   3: ##############################################
   4: ###  GlusterFS Server Volume Specification  ##
   5: ##############################################
   6:
   7: #### CONFIG FILE RULES:
   8: ### "#" is comment character.
   9: ### - Config file is case sensitive
  10: ### - Options within a volume block can be in any order.
  11: ### - Spaces or tabs are used as delimitter within a line.
  12: ### - Multiple values to options will be : delimitted.
  13: ### - Each option should end within a line.
  14: ### - Missing or commented fields will assume default values.
  15: ### - Blank/commented lines are allowed.
  16: ### - Sub-volumes should already be defined above before referring.
  17:
  18: ### Export volume "home1" with the contents of "/home/export" directory.
  19: volume home1
  20:   type storage/posix                   # POSIX FS translator
  21:   option directory /gluster/home        # Export this directory
  22: end-volume
  23:
  24: volume posix-locks-home1
  25:   type features/posix-locks
  26:   option mandatory on
  27:   subvolumes home1
  28: end-volume
  29:
  30: ## Reference volume "home2" from remote server
  31: volume home2
  32:   type protocol/client                   # POSIX FS translator
  33:   option transport-type tcp/client
  34:   option remote-host 192.168.2.2  # IP address of remote host
  35:   option remote-subvolume posix-locks-home1        # use home1 
on remote ho
st
  36:   option transport-timeout 10           # value in seconds; it 
should be se
t relatively low
  37: end-volume
  38:
  39: ### Add network serving capability to above home.
  40: volume server
  41:   type protocol/server
  42:   option transport-type tcp/server     # For TCP/IP transport
  43:   subvolumes posix-locks-home1
  44:   option auth.addr.posix-locks-home1.allow 
192.168.2.2,127.0.0.1 # Allow a
ccess to "home1" volume
  45: end-volume
  46:
  47: ### Create automatic file replication
  48: volume home
  49:   type cluster/afr
  50:   option read-subvolume posix-locks-home1
  51:   subvolumes posix-locks-home1 home2
  52: #  subvolumes posix-locks-home1
  53: end-volume
  54:
  55: #volume threads1
  56: #  type performance/io-threads
  57: #  option thread-count 2
  58: #  option cache-size 32MB
  59: #  subvolumes home
  60: #end-volume
+-----
2008-12-14 00:10:39 E [socket.c:710:socket_connect_finish] home2: 
connection fai
led (Connection refused)
2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2: 
failed to get r
emote fd number for fd_t(0x1552360)
2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2: 
failed to get re
mote fd from fd_t(0x1552360). returning EBADFD
2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2: 
failed to get r
emote fd number for fd_t(0x1552360)
2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2: 
failed to get re
mote fd from fd_t(0x1552360). returning EBADFD
2008-12-14 00:11:18 W [afr-self-heal-common.c:985:afr_self_heal] 
home: performin
g self heal on /sharedtmp (metadata=0 data=0 entry=1)
2008-12-14 00:11:18 W 
[afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home
: impunging entries of /sharedtmp on posix-locks-home1 to other sinks
2008-12-14 00:11:18 W 
[afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home:
  expunging entries of /sharedtmp on home2 to other sinks
2008-12-14 00:11:18 W [afr-self-heal-entry.c:70:afr_sh_entry_done] 
home: self he
al of /sharedtmp completed






More information about the Gluster-users mailing list