[Gluster-devel] Troubles with AFR - core dump
Tom Myny
tom.myny at tigron.be
Tue Mar 25 14:00:38 UTC 2008
Hi Amar,
I’ve update to the latest tla version and I’m still getting a core dump:
Scenario: Doing a copy from a local xfs disk /backup/img_thumb to mounted glusterfs /sas/img_thumb
And at the same also a copy from a local xfs disk /backup/img_large to mounted glusterfs /sata/img_large
Debug logs of the client:
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345092: LOOKUP /img_thumb/20021028/autrefois/1035814770.jpg
2008-03-25 12:56:03 D [fuse-bridge.c:455:fuse_entry_cbk] glusterfs-fuse: 345092: (34) /img_thumb/20021028/autrefois/1035814770.jpg => -1 (2)
2008-03-25 12:56:03 D [fuse-bridge.c:1471:fuse_create] glusterfs-fuse: 345093: CREATE /img_thumb/20021028/autrefois/1035814770.jpg
2008-03-25 12:56:03 D [fuse-bridge.c:1365:fuse_create_cbk] glusterfs-fuse: 345093: (27) /img_thumb/20021028/autrefois/1035814770.jpg => 0x73a4b0
2008-03-25 12:56:03 D [inode.c:577:__create_inode] fuse/inode: create inode(15688)
2008-03-25 12:56:03 D [inode.c:367:__active_inode] fuse/inode: activating inode(15688), lru=8032/1024
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606275: WRITE (0x7e7090, size=4096, offset=0)
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 345094: WRITE (0x73a4b0, size=4096, offset=0)
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606275: WRITE => 4096/4096,0/4096
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606276: WRITE (0x7e7090, size=4096, offset=4096)
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 345094: WRITE => 4096/4096,0/4096
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 345095: WRITE (0x73a4b0, size=117, offset=4096)
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606276: WRITE => 4096/4096,4096/8192
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606277: WRITE (0x7e7090, size=4096, offset=8192)
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 345095: WRITE => 117/117,4096/4213
2008-03-25 12:56:03 D [fuse-bridge.c:1664:fuse_flush] glusterfs-fuse: 345096: FLUSH 0x73a4b0
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606277: WRITE => 4096/4096,8192/12288
2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606278: WRITE (0x7e7090, size=2242, offset=12288)
2008-03-25 12:56:03 D [fuse-bridge.c:916:fuse_err_cbk] glusterfs-fuse: 345096: (16) ERR => 0
2008-03-25 12:56:03 D [fuse-bridge.c:1691:fuse_release] glusterfs-fuse: 345097: CLOSE 0x73a4b0
2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606278: WRITE => 2242/2242,12288/14530
2008-03-25 12:56:03 D [fuse-bridge.c:1664:fuse_flush] glusterfs-fuse: 606279: FLUSH 0x7e7090
2008-03-25 12:56:03 D [fuse-bridge.c:916:fuse_err_cbk] glusterfs-fuse: 606279: (16) ERR => 0
2008-03-25 12:56:03 D [fuse-bridge.c:1691:fuse_release] glusterfs-fuse: 606280: CLOSE 0x7e7090
2008-03-25 12:56:03 D [fuse-bridge.c:520:fuse_lookup] glusterfs-fuse: 606281: LOOKUP /img_large/20020814(2147531943)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345098: LOOKUP /img_thumb/20021028/autrefois/1035814963.jpg
2008-03-25 12:56:03 E [protocol.c:271:gf_block_unserialize_transport] sas: EOF from peer (10.6.0.10:6996)
2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sas: cleaning up state in transport object 0x50ce80
2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sas: forced unwinding frame type(1) op(17) reply=@0x2aaaab728270
2008-03-25 12:56:03 E [client-protocol.c:3636:client_close_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:922:fuse_err_cbk] glusterfs-fuse: 345097: (17) ERR => -1 (107)
2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sas: forced unwinding frame type(1) op(34) reply=@0x2aaaab728270
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345098: (34) /img_thumb/20021028/autrefois/1035814963.jpg => -1 (107)
2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sas: connection disconnected
2008-03-25 12:56:03 E [protocol.c:271:gf_block_unserialize_transport] sas: EOF from peer (10.6.0.10:6996)
2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sas: cleaning up state in transport object 0x50ff70
2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sas: connection disconnected
2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sata: cleaning up state in transport object 0x50ce80
2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sata: forced unwinding frame type(1) op(17) reply=@0x2aaaac2e77d0
2008-03-25 12:56:03 E [client-protocol.c:3636:client_close_cbk] sata: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:922:fuse_err_cbk] glusterfs-fuse: 606280: (17) ERR => -1 (107)
2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sata: forced unwinding frame type(1) op(34) reply=@0x2aaaac2e77d0
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sata: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 606281: (34) /img_large/20020814 => -1 (107)
2008-03-25 12:56:03 E [client-protocol.c:348:client_protocol_xfer] sata: transport_submit failed
2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sata: connection disconnected
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345099: LOOKUP /img_thumb/20021028/babbuinoconte
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345099: (34) /img_thumb/20021028/babbuinoconte => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345100: LOOKUP /img_thumb/20021028/bd
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345100: (34) /img_thumb/20021028/bd => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345101: LOOKUP /img_thumb/20021028/bdelplace
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345101: (34) /img_thumb/20021028/bdelplace => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345102: LOOKUP /img_thumb/20021028/belline
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345102: (34) /img_thumb/20021028/belline => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345103: LOOKUP /img_thumb/20021028/biquet
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345103: (34) /img_thumb/20021028/biquet => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345104: LOOKUP /img_thumb/20021028/botilhon
2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6
2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'
2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996
2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10
2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache
2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)
2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)
2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)
2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN
2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345104: (34) /img_thumb/20021028/botilhon => -1 (107)
2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345105: LOOKUP /img_thumb/20021028/broosguy
Etc …
------
Debugs of the server:
2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15686), lru=1024/1024
2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sas/inode: destroy inode(1610627503) [@0x2aaaaba1c5a0]
2008-03-25 12:56:03 D [inode.c:577:__create_inode] sas/inode: create inode(15687)
2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15687), lru=1024/1024
2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sas/inode: destroy inode(1610627504) [@0x2aaaaba1be00]
2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sata/inode: destroy inode(47547) [@0x2aaaab91c9f0]
2008-03-25 12:56:03 D [inode.c:577:__create_inode] sata/inode: create inode(2147532094)
2008-03-25 12:56:03 D [inode.c:367:__active_inode] sata/inode: activating inode(2147532094), lru=1024/1024
2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sata/inode: destroy inode(47548) [@0x2aaaab607f20]
2008-03-25 12:56:03 D [inode.c:577:__create_inode] sata/inode: create inode(2147532095)
2008-03-25 12:56:03 D [inode.c:367:__active_inode] sata/inode: activating inode(2147532095), lru=1024/1024
2008-03-25 12:56:03 D [inode.c:577:__create_inode] sas/inode: create inode(15688)
2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15688), lru=1024/1024
TLA Repo Revision: glusterfs--mainline--2.5--patch-717
Time : 2008-03-25 12:56:03
Signal Number : 11
glusterfsd -f /etc/glusterfs/server.vol -l /var/log/glusterfs/glusterfsd.log -L DEBUG
volume server
type protocol/server
option auth.ip.sata.allow *
option auth.ip.sas.allow *
option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1
option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1
option transport-type tcp/server
subvolumes sas sata
end-volume
volume sata
type performance/io-threads
option cache-size 64MB
option thread-count 8
subvolumes sata-unify
… (see below)…
volume sas-ns
type storage/posix
option directory /sas/ns
end-volume
volume sas-ds
type storage/posix
option directory /sas/data
end-volume
frame : type(0) op(0)
frame : type(0) op(0)
/lib/libc.so.6[0x2af484498110]
/lib/libc.so.6(strlen+0x10)[0x2af4844db580]
/usr/lib/libglusterfs.so.0[0x2af484143458]
/usr/lib/libglusterfs.so.0(mop_lock_impl+0x62)[0x2af4841438e2]
/usr/lib/glusterfs/1.3.8/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd427f]
/usr/lib/glusterfs/1.3.8/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade9c83]
/usr/lib/glusterfs/1.3.8/xlator/performance/io-threads.so[0x2aaaaaef6704]
/usr/lib/libglusterfs.so.0(call_resume+0x78)[0x2af4841451a8]
/usr/lib/glusterfs/1.3.8/xlator/performance/io-threads.so[0x2aaaaaef5c6b]
/lib/libpthread.so.0[0x2af484359f1a]
TLA Repo Revision: glusterfs--mainline--2.5--patch-717
Time : 2008-03-25 12:56:03
Signal Number : 11
glusterfsd -f /etc/glusterfs/server.vol -l /var/log/glusterfs/glusterfsd.log -L DEBUG
/lib/libc.so.6(__clone+0x72)[0x2af4845325d2]
---------
volume server
type protocol/server
option auth.ip.sata.allow *
option auth.ip.sas.allow *
option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1
option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1
option transport-type tcp/server
subvolumes sas sata
end-volume
volume sata
type performance/io-threads
option cache-size 64MB
option thread-count 8
subvolumes sata-unify
end-volume
volume sas
type performance/io-threads
option cache-size 64MB
option thread-count 8
subvolumes sas-unify
end-volume
-------
Doing a gdb traceback:
(gdb) backtrace
#0 0x00002af4844db580 in strlen () from /lib/libc.so.6
#1 0x00002af484143458 in place_lock_after (granted=0x2af48424f9e0,
path=0x2aaaaba37370 "/sas-ds//img_thumb/20021028/autrefois/1035814770.jpg/") at ../../../libglusterfs/src/lock.c:65
#2 0x00002af4841438e2 in mop_lock_impl (frame=0x2aaaaba81c80,
this_xl=<value optimized out>,
path=0x2aaaaba64b50 "/sas-ds//img_thumb/20021028/autrefois/1035814770.jpg")
at ../../../libglusterfs/src/lock.c:102
#3 0x00002aaaaacd427f in afr_close (frame=0x2aaaaba8dab0,
this=<value optimized out>, fd=0x2aaaab6075f0)
at ../../../../../xlators/cluster/afr/src/afr.c:3366
#4 0x00002aaaaade9c83 in unify_close (frame=0x2aaaaba72130,
this=<value optimized out>, fd=0x2aaaab6075f0)
at ../../../../../xlators/cluster/unify/src/unify.c:2434
#5 0x00002aaaaaef6704 in iot_close_wrapper (frame=0x5696b0, this=0x50de10,
fd=0x2aaaab6075f0)
at ../../../../../xlators/performance/io-threads/src/io-threads.c:188
#6 0x00002af4841451a8 in call_resume (stub=0x3638312c33353865)
at ../../../libglusterfs/src/call-stub.c:2975
#7 0x00002aaaaaef5c6b in iot_worker (arg=<value optimized out>)
at ../../../../../xlators/performance/io-threads/src/io-threads.c:1024
#8 0x00002af484359f1a in start_thread () from /lib/libpthread.so.0
#9 0x00002af4845325d2 in clone () from /lib/libc.so.6
---Type <return> to continue, or q <return> to quit---
#10 0x0000000000000000 in ?? ()
Regards,
Tom
From: amarts at gmail.com [mailto:amarts at gmail.com] On Behalf Of Amar S. Tumballi
Sent: maandag 24 maart 2008 22:02
To: Tom Myny - Tigron BVBA
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] Troubles with AFR - core dump
Hi Tom,
Thanks for reporting the bug. I believe this bug is fixed in patch-698. and also there is some more enhancement to afr in 'patch-712'. I am in the process of preliminary testing for releasing 1.3.8pre4. You can try that out once its ready or you can choose to use latest TLA which fixes it.
We will look into it further and getback to you about the progress. It would be very helpful if you tell the methods which caused this scenario.
Regards,
Amar
On Mon, Mar 24, 2008 at 1:12 PM, Tom Myny - Tigron BVBA <tom.myny at tigron.be> wrote:
Hello all,
I was first using a fresh debian (latest stable - etch) with the
debian packages that are available here ( deb
http://lmello.virt-br.org/debian ./ ...)
But when using them a noticed a crash on one of my servers.
So i noticed that the version that i was using was glusterfs 1.3.8pre1.
After an upgrade to glusterfs 1.3.8pre3 i've got another core dump so
i'm out of ideas for the moment :)
My current (test) set-up
Two server's and a client running the latest debian stable using
glusterfs 1.3.8pre1.
The kernels:
Linux tweety 2.6.18-6-amd64 #1 SMP Sun Feb 10 17:50:19 UTC 2008 x86_64
GNU/Linux
The filessytems are running xfs.
My Config file on one of the servers:
volume sas-ds
type storage/posix
option directory /sas/data
end-volume
volume sas-ns
type storage/posix
option directory /sas/ns
end-volume
volume sata-ds
type storage/posix
option directory /sata/data
end-volume
volume sata-ns
type storage/posix
option directory /sata/ns
end-volume
volume sas-backup-ds
type protocol/client
option transport-type tcp/client
option remote-host 10.6.0.8
option remote-subvolume sas-ds
end-volume
volume sas-backup-ns
type protocol/client
option transport-type tcp/client
option remote-host 10.6.0.8
option remote-subvolume sas-ns
end-volume
volume sata-backup-ds
type protocol/client
option transport-type tcp/client
option remote-host 10.6.0.8
option remote-subvolume sata-ds
end-volume
volume sata-backup-ns
type protocol/client
option transport-type tcp/client
option remote-host 10.6.0.8
option remote-subvolume sata-ns
end-volume
volume sas-ds-afr
type cluster/afr
subvolumes sas-ds sas-backup-ds
end-volume
volume sas-ns-afr
type cluster/afr
subvolumes sas-ns sas-backup-ns
end-volume
volume sata-ds-afr
type cluster/afr
subvolumes sata-ds sata-backup-ds
end-volume
volume sata-ns-afr
type cluster/afr
subvolumes sata-ns sata-backup-ns
end-volume
volume sas-unify
type cluster/unify
subvolumes sas-ds-afr
option namespace sas-ns-afr
option scheduler rr
end-volume
volume sata-unify
type cluster/unify
subvolumes sata-ds-afr
option namespace sata-ns-afr
option scheduler rr
end-volume
volume sas
type performance/io-threads
option thread-count 8
option cache-size 64MB
subvolumes sas-unify
end-volume
volume sata
type performance/io-threads
option thread-count 8
option cache-size 64MB
subvolumes sata-unify
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes sas sata
option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1
option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1
option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1
option auth.ip.sas.allow *
option auth.ip.sata.allow *
end-volume
The client is running with the following config file:
volume sas
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.6.0.10
option remote-subvolume sas # name of the remote volume
end-volume
volume sata
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 10.6.0.10
option remote-subvolume sata # name of the remote volume
end-volume
After a wile, when the client is copying data to my server 10.6.0.10
(and this server is replycating with afr nicely to 10.6.0.8) a get the
following message:
TLA Repo Revision: glusterfs--mainline--2.5--patch-704
Time : 2008-03-24 18:36:09
Signal Number : 11
glusterfsd -l /var/log/glusterfs/glusterfsd.log -L WARNING
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
TLA Repo Revision: glusterfs--mainline--2.5--patch-704
Time : 2008-03-24 18:36:09
Signal Number : 11
glusterfsd -l /var/log/glusterfs/glusterfsd.log -L WARNING
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
/lib/libc.so.6[0x2af9d5c48110]
/lib/libc.so.6[0x2af9d5c48110]
/lib/libc.so.6(strlen+0x30)[0x2af9d5c8b5a0]
/lib/libc.so.6(strlen+0x30)[0x2af9d5c8b5a0]
/usr/lib/libglusterfs.so.0[0x2af9d58f478e]
/usr/lib/libglusterfs.so.0[0x2af9d58f478e]
/usr/lib/libglusterfs.so.0(mop_lock_impl+0x4e)[0x2af9d58f4c1e]
/usr/lib/libglusterfs.so.0(mop_lock_impl+0x4e)[0x2af9d58f4c1e]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd1a2f]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd1a2f]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade5663]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade5663]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaef0704]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaef0704]
/usr/lib/libglusterfs.so.0(call_resume+0x6a)[0x2af9d58f63ca]
/usr/lib/libglusterfs.so.0(call_resume+0x6a)[0x2af9d58f63ca]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaeefc6b]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaeefc6b]
/lib/libpthread.so.0[0x2af9d5b09f1a]
/lib/libpthread.so.0[0x2af9d5b09f1a]
/lib/libc.so.6(__clone+0x72)[0x2af9d5ce25d2]
---------
/lib/libc.so.6(__clone+0x72)[0x2af9d5ce25d2]
---------
Doing a gdb on the coredump:
Core was generated by `[glusterfs]
'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002af9d5c8b5a0 in strlen () from /lib/libc.so.6
(gdb) backtrace
#0 0x00002af9d5c8b5a0 in strlen () from /lib/libc.so.6
#1 0x00002af9d58f478e in place_lock_after (granted=0x2af9d59ff9c0,
path=0x2aaaabdbf1e0
"/sata-ds//img_large/20030106/fred33go/1041867234.jpg/") at lock.c:84
#2 0x00002af9d58f4c1e in mop_lock_impl (frame=0x2aaaabdbf180,
this_xl=<value optimized out>, path=0x2aaaabdbf330
"/sata-ds//img_large/20030106/fred33go/1041867234.jpg") at lock.c:118
#3 0x00002aaaaacd1a2f in afr_close (frame=0x2aaaabdb1bb0, this=<value
optimized out>, fd=0x2aaaabb916c0) at afr.c:3676
#4 0x00002aaaaade5663 in unify_close (frame=0x2aaaabdb1d20,
this=<value optimized out>, fd=0x2aaaabb916c0) at unify.c:2384
#5 0x00002aaaaaef0704 in iot_close_wrapper (frame=0x2aaaabb71260,
this=0x50dfb0, fd=0x2aaaabb916c0) at io-threads.c:190
#6 0x00002af9d58f63ca in call_resume (stub=0x0) at call-stub.c:2740
#7 0x00002aaaaaeefc6b in iot_worker (arg=<value optimized out>) at
io-threads.c:1061
#8 0x00002af9d5b09f1a in start_thread () from /lib/libpthread.so.0
#9 0x00002af9d5ce25d2 in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()
If someone has an idea, it's very welcome.
Regards,
Tom
_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel
--
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!
More information about the Gluster-devel
mailing list