[Gluster-users] Gluster crashes when cascading AFR

Rainer Schwemmer rainer.schwemmer at cern.ch
Wed Dec 17 10:29:44 UTC 2008


Hello Vikas,

I have installed and tested now my setup with 1.4.0rc3. The good nes is
that gluster does not crash anymore on the intermediate level of the
structure. The bad news is that afr doesn't seem to work at all anymore
for me. Even with a reduced setup using only 2 hosts and 2 sub-volumes
for AFR on a local disk on one of the two i can't write anything onto
the volume. When i try to create a file on the exported volume, it will
hang for a few minutes and then return with a "transport endpoint not
connected". Here are the two config and log files.

Cheers,
  Rainer

log host 1 (client):

2008-12-17 11:06:40 D [glusterfs.c:297:_get_specfp] glusterfs: loading
volume
file /home/rainer/sources/gluster/software.farmcontrol-client-debug.vol

Version      : glusterfs 1.4.0rc3 built on Dec 16 2008 10:23:44
TLA Revision : glusterfs--mainline--3.0--patch-777
Starting Time: 2008-12-17 11:06:40
Command line : glusterfs
-f /home/rainer/sources/gluster/software.farmcontrol-client-debug.vol
-l /var/log/gluster.log -L DEBUG /mnt/mnt1 
given volfile
+-----
  1: volume hlta01-client
  2:   type protocol/client
  3:   option transport-type tcp/client
  4:   option remote-host hlta01
  5:   option remote-subvolume head
  6: end-volume
  7: 
  8: volume afr-sw-farmcontrol
  9:   type cluster/afr
 10:   subvolumes hlta01-client 
 11: end-volume
 12: 
+-----
2008-12-17 11:06:40 D [spec.y:187:new_section] parser: New node for
'hlta01-client'
2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/protocol/client.so
2008-12-17 11:06:40 D [spec.y:213:section_type] parser:
Type:hlta01-client:protocol/client
2008-12-17 11:06:40 D [spec.y:288:section_option] parser:
Option:hlta01-client:transport-type:tcp/client
2008-12-17 11:06:40 D [spec.y:288:section_option] parser:
Option:hlta01-client:remote-host:hlta01
2008-12-17 11:06:40 D [spec.y:288:section_option] parser:
Option:hlta01-client:remote-subvolume:head
2008-12-17 11:06:40 D [spec.y:372:section_end] parser: end:hlta01-client
2008-12-17 11:06:40 D [spec.y:187:new_section] parser: New node for
'afr-sw-farmcontrol'
2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/cluster/afr.so
2008-12-17 11:06:40 D [spec.y:213:section_type] parser:
Type:afr-sw-farmcontrol:cluster/afr
2008-12-17 11:06:40 D [spec.y:357:section_sub] parser:
child:afr-sw-farmcontrol->hlta01-client
2008-12-17 11:06:40 D [spec.y:372:section_end] parser:
end:afr-sw-farmcontrol
2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/mount/fuse.so
2008-12-17 11:06:40 D [glusterfs.c:927:main] glusterfs: running in pid
8938
2008-12-17 11:06:40 D [client-protocol.c:5955:init] hlta01-client:
defaulting transport-timeout to 42
2008-12-17 11:06:40 D [transport.c:118:transport_load] transport:
attempt to load file /usr/lib64/glusterfs/1.4.0rc3/transport/socket.so
2008-12-17 11:06:40 D [client-protocol.c:6008:init] hlta01-client:
defaulting limits.transaction-size to 268435456
2008-12-17 11:06:40 D [xlator.c:519:xlator_init_rec] hlta01-client:
Initialization done
2008-12-17 11:06:40 D [client-protocol.c:6281:notify] hlta01-client: got
GF_EVENT_PARENT_UP, attempting connect on transport
2008-12-17 11:06:40 D [client-protocol.c:6281:notify] hlta01-client: got
GF_EVENT_PARENT_UP, attempting connect on transport
2008-12-17 11:06:40 D [inode.c:987:inode_table_new] fuse: creating new
inode table with lru_limit=0
2008-12-17 11:06:40 D [inode.c:455:__inode_create] fuse/inode: create
inode(0)
2008-12-17 11:06:41 D [client-protocol.c:5620:client_protocol_reconnect]
hlta01-client: attempting reconnect
2008-12-17 11:06:41 D [name.c:183:af_inet_client_get_remote_sockaddr]
hlta01-client: option remote-port missing in volume hlta01-client.
Defaulting to 6996
2008-12-17 11:06:41 D [common-utils.c:213:gf_resolve_ip6] resolver: DNS
cache not present, freshly probing hostname: hlta01
2008-12-17 11:06:41 D [common-utils.c:250:gf_resolve_ip6] resolver:
returning ip-10.130.101.100 (port-6996) for hostname: hlta01 and port:
6996
2008-12-17 11:06:41 D [client-protocol.c:6313:notify] hlta01-client: got
GF_EVENT_CHILD_UP
2008-12-17 11:06:41 D [socket.c:926:socket_connect] hlta01-client:
connect () called on transport already connected
2008-12-17 11:06:41 D [client-protocol.c:5561:client_setvolume_cbk]
hlta01-client: SETVOLUME on remote-host succeeded
2008-12-17 11:06:51 D [client-protocol.c:5629:client_protocol_reconnect]
hlta01-client: breaking reconnect chain
2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode:
activating inode(1), lru=0/0 active=1 purge=0
2008-12-17 11:07:48 D [fuse-bridge.c:455:fuse_lookup] glusterfs-fuse: 2:
LOOKUP /test
2008-12-17 11:07:48 D [inode.c:455:__inode_create] fuse/inode: create
inode(0)
2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode:
activating inode(0), lru=0/0 active=2 purge=0
2008-12-17 11:07:48 D [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse:
2: LOOKUP() /test => -1 (No such file or directory)
2008-12-17 11:07:48 D [inode.c:323:__inode_retire] fuse/inode: retiring
inode(0) lru=0/0 active=1 purge=1
2008-12-17 11:07:48 D [inode.c:455:__inode_create] fuse/inode: create
inode(0)
2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode:
activating inode(0), lru=0/0 active=2 purge=0
2008-12-17 11:07:48 D [fuse-bridge.c:1089:fuse_mknod] glusterfs-fuse: 3:
MKNOD /test
2008-12-17 11:08:32 E [client-protocol.c:273:call_bail] hlta01-client:
activating bail-out. pending frames = 1. last sent = 2008-12-17
11:07:49. last received = 2008-12-17 11:07:49. transport-timeout = 42
2008-12-17 11:08:32 C [client-protocol.c:308:call_bail] hlta01-client:
bailing transport
2008-12-17 11:08:32 D [socket.c:183:__socket_disconnect] hlta01-client:
shutdown() returned 0. setting connection state to -1
2008-12-17 11:08:32 D [socket.c:93:__socket_rwv] hlta01-client: EOF from
peer 10.130.101.100:6996
2008-12-17 11:08:32 D [socket.c:568:socket_proto_state_machine]
hlta01-client: socket read failed (Transport endpoint is not connected)
in state 1 (10.130.101.100:6996)
2008-12-17 11:08:32 D [client-protocol.c:5652:protocol_client_cleanup]
hlta01-client: cleaning up state in transport object 0x50fc00
2008-12-17 11:08:32 E [client-protocol.c:5712:protocol_client_cleanup]
hlta01-client: forced unwinding frame type(1) op(MKNOD) reply=@0x518a10
2008-12-17 11:08:32 E [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse:
3: MKNOD() /test => -1 (Transport endpoint is not connected)
2008-12-17 11:08:32 E [socket.c:1189:socket_submit] hlta01-client:
transport not connected to submit (priv->connected = 255)
2008-12-17 11:08:32 D [inode.c:323:__inode_retire] fuse/inode: retiring
inode(0) lru=0/0 active=1 purge=1
2008-12-17 11:08:32 D [fuse-bridge.c:455:fuse_lookup] glusterfs-fuse: 4:
LOOKUP /test
2008-12-17 11:08:32 D [inode.c:455:__inode_create] fuse/inode: create
inode(0)
2008-12-17 11:08:32 D [inode.c:280:__inode_activate] fuse/inode:
activating inode(0), lru=0/0 active=2 purge=0
2008-12-17 11:08:32 D [name.c:183:af_inet_client_get_remote_sockaddr]
hlta01-client: option remote-port missing in volume hlta01-client.
Defaulting to 6996
2008-12-17 11:08:32 D [common-utils.c:206:gf_resolve_ip6] resolver:
flushing DNS cache
2008-12-17 11:08:32 D [common-utils.c:213:gf_resolve_ip6] resolver: DNS
cache not present, freshly probing hostname: hlta01
2008-12-17 11:08:32 D [common-utils.c:250:gf_resolve_ip6] resolver:
returning ip-10.130.101.100 (port-6996) for hostname: hlta01 and port:
6996
2008-12-17 11:08:32 E [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse:
4: LOOKUP() /test => -1 (Transport endpoint is not connected)
2008-12-17 11:08:32 D [inode.c:323:__inode_retire] fuse/inode: retiring
inode(0) lru=0/0 active=1 purge=1
2008-12-17 11:08:32 D [client-protocol.c:6313:notify] hlta01-client: got
GF_EVENT_CHILD_UP
2008-12-17 11:08:32 D [socket.c:926:socket_connect] hlta01-client:
connect () called on transport already connected
2008-12-17 11:08:32 D [client-protocol.c:5561:client_setvolume_cbk]
hlta01-client: SETVOLUME on remote-host succeeded
2008-12-17 11:08:33 D [client-protocol.c:5629:client_protocol_reconnect]
hlta01-client: breaking reconnect chain







log host2 (server):

2008-12-17 11:05:25 D [glusterfs.c:297:_get_specfp] glusterfs: loading
volume file /home/rainer/sources/gluster/sw-farmctl.hlta01.vol

Version      : glusterfs 1.4.0rc3 built on Dec 16 2008 10:23:44
TLA Revision : glusterfs--mainline--3.0--patch-777
Starting Time: 2008-12-17 11:05:25
Command line : glusterfsd
-f /home/rainer/sources/gluster/sw-farmctl.hlta01.vol -L DEBUG
-l /var/log/glusterfs/glusterfsd.log 
given volfile
+-----
  1: volume local-brick
  2:   type storage/posix
  3:   option directory /localdisk/gluster/sw
  4: end-volume
  5: 
  6: volume lock-brick
  7:   type features/locks
  8:   subvolumes local-brick
  9: end-volume
 10: 
 11: volume local-brick2
 12:   type storage/posix
 13:   option directory /localdisk/gluster/sw2
 14: end-volume
 15: 
 16: volume lock-brick2
 17:   type features/locks
 18:   subvolumes local-brick2
 19: end-volume
 20: 
 21: #volume hlta0101-client
 22: #  type protocol/client
 23: #  option transport-type tcp/client
 24: #  option remote-host hlta0101
 25: #  option remote-subvolume sw-brick
 26: #end-volume
 27: 
 28: #volume hlta0102-client
 29: #  type protocol/client
 30: #  option trasport-type tcp/client
 31: #  option remote-host hlta0102
 32: #  option remote-subvolume sw-brick
 33: #end-volume
 34: 
 35: #volume hlta0103-client
 36: #  type protocol/client
 37: #  option trasport-type tcp/client
 38: #  option remote-host hlta0103
 39: #  option remote-subvolume sw-brick
 40: #end-volume
 41: 
 42: #volume hlta0104-client
 43: #  type protocol/client
 44: #  option trasport-type tcp/client
 45: #  option remote-host hlta0104
 46: #  option remote-subvolume sw-brick
 47: #end-volume
 48: 
 49: volume afr-distributor
 50:   type cluster/afr
 51:   subvolumes lock-brick lock-brick2 #hlta0101-client 
 52: #hlta0102-client hlta0103-client hlta0104-client
 53: end-volume
 54: 
 55: volume head
 56:   type debug/trace
 57:   subvolumes afr-distributor
 58: end-volume
 59: 
 60: #volume head
 61: #  type performance/io-threads
 62: #  option thread-count 4  # deault is 1
 63: #  option cache-size 128MB
 64: #  subvolumes afr-distributor
 65: #end-volume
 66: 
 67: volume server
 68:   type protocol/server
 69:   option transport-type tcp/server
 70:   option auth.addr.head.allow *
 71:   subvolumes head
 72: end-volume
+-----
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'local-brick'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/storage/posix.so
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:local-brick:storage/posix
2008-12-17 11:05:25 D [spec.y:288:section_option] parser:
Option:local-brick:directory:/localdisk/gluster/sw
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:local-brick
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'lock-brick'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so
2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator:
dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so:
undefined symbol: notify -- neglecting
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:lock-brick:features/locks
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:lock-brick->local-brick
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:lock-brick
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'local-brick2'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/storage/posix.so
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:local-brick2:storage/posix
2008-12-17 11:05:25 D [spec.y:288:section_option] parser:
Option:local-brick2:directory:/localdisk/gluster/sw2
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:local-brick2
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'lock-brick2'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so
2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator:
dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so:
undefined symbol: notify -- neglecting
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:lock-brick2:features/locks
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:lock-brick2->local-brick2
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:lock-brick2
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'afr-distributor'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/cluster/afr.so
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:afr-distributor:cluster/afr
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:afr-distributor->lock-brick
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:afr-distributor->lock-brick2
2008-12-17 11:05:25 D [spec.y:372:section_end] parser:
end:afr-distributor
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'head'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/debug/trace.so
2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator:
dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/debug/trace.so:
undefined symbol: notify -- neglecting
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:head:debug/trace
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:head->afr-distributor
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:head
2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for
'server'
2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to
load file /usr/lib64/glusterfs/1.4.0rc3/xlator/protocol/server.so
2008-12-17 11:05:25 D [spec.y:213:section_type] parser:
Type:server:protocol/server
2008-12-17 11:05:25 D [spec.y:288:section_option] parser:
Option:server:transport-type:tcp/server
2008-12-17 11:05:25 D [spec.y:288:section_option] parser:
Option:server:auth.addr.head.allow:*
2008-12-17 11:05:25 D [spec.y:357:section_sub] parser:
child:server->head
2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:server
2008-12-17 11:05:25 D [glusterfs.c:927:main] glusterfs: running in pid
24506
2008-12-17 11:05:25 D [transport.c:118:transport_load] transport:
attempt to load file /usr/lib64/glusterfs/1.4.0rc3/transport/socket.so
2008-12-17 11:05:25 D [server-protocol.c:7596:init] server: defaulting
limits.transaction-size to 4194304
2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] local-brick:
Initialization done
2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] lock-brick:
Initialization done
2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] local-brick2:
Initialization done
2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] lock-brick2:
Initialization done
2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] afr-distributor:
Initialization done
2008-12-17 11:05:25 C [dict.c:1067:data_to_str] dict: @data=(nil)
2008-12-17 11:05:25 C [dict.c:1067:data_to_str] dict: @data=(nil)
2008-12-17 11:07:48 N [trace.c:1237:trace_lookup] head: 3: (loc
{path=/test, ino=0} need_xattr=1)
2008-12-17 11:07:48 N [trace.c:513:trace_lookup_cbk] head: 3:
(op_ret=-1, op_errno=2)
2008-12-17 11:07:48 N [trace.c:1101:trace_entrylk] head: 4: (loc=
{path=/, ino=1} basename=test, cmd=ENTRYLK_LOCK, type=ENTRYLK_WRLCK)
2008-12-17 11:07:48 N [trace.c:1021:trace_entrylk_cbk] head: 4:
op_ret=0, op_errno=0
2008-12-17 11:07:48 N [trace.c:1189:trace_xattrop] head: 5: (path=/,
ino=1 flags=0)
2008-12-17 11:07:48 E [posix.c:2419:posix_xattrop] local-brick: /:
Numerical result out of range
2008-12-17 11:07:48 E [posix.c:2419:posix_xattrop] local-brick2: /:
Numerical result out of range
2008-12-17 11:07:48 N [trace.c:1042:trace_xattrop_cbk] head: 5:
(op_ret=0, op_errno=34)
2008-12-17 11:07:48 N [trace.c:1307:trace_mknod] head: 6: (loc
{path=/test, ino=0}, mode=33188, dev=0)




On Mon, 2008-12-15 at 21:24 +0530, Vikas Gorur wrote:
> Rainer,
> 
> Thank you for your interest in GlusterFS.
> 
> I do not know of any user who's had an AFR configuration with 40-50
> subvolumes, but there is no reason it shouldn't work. The write
> performance will obviously be quite low, but in your case since you
> will not be making heavy/daily use of it (the only writes will be when
> you make a new release, if I understand correctly), that shouldn't be
> an issue.
> 
> The version of GlusterFS you're using (1.3.12) is rather old now. We
> have a new release 1.4.0 in the final stages of testing. We haven't
> yet completely tested the AFR-over-AFR setup yet.
> 
> You could either wait a few days (less than a week) for us to make the
> RC1 release with AFR-over-AFR tested or grab the TLA repository
> version and give it a try.
> 
> Vikas
> --
> Engineer - Z Research
> http://gluster.com/





More information about the Gluster-users mailing list