[Gluster-devel] Troubles with AFR - core dump

Tom Myny tom.myny at tigron.be
Tue Mar 25 14:00:38 UTC 2008


Hi Amar,

 

I’ve update to the latest tla version and I’m still getting a core dump:

 

Scenario: Doing a copy from a local xfs disk /backup/img_thumb to mounted glusterfs /sas/img_thumb

And at the same also a copy from a local xfs disk /backup/img_large to mounted glusterfs /sata/img_large

 

Debug logs of the client:

 

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345092: LOOKUP /img_thumb/20021028/autrefois/1035814770.jpg

2008-03-25 12:56:03 D [fuse-bridge.c:455:fuse_entry_cbk] glusterfs-fuse: 345092: (34) /img_thumb/20021028/autrefois/1035814770.jpg => -1 (2)

2008-03-25 12:56:03 D [fuse-bridge.c:1471:fuse_create] glusterfs-fuse: 345093: CREATE /img_thumb/20021028/autrefois/1035814770.jpg

2008-03-25 12:56:03 D [fuse-bridge.c:1365:fuse_create_cbk] glusterfs-fuse: 345093: (27) /img_thumb/20021028/autrefois/1035814770.jpg => 0x73a4b0

2008-03-25 12:56:03 D [inode.c:577:__create_inode] fuse/inode: create inode(15688)

2008-03-25 12:56:03 D [inode.c:367:__active_inode] fuse/inode: activating inode(15688), lru=8032/1024

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606275: WRITE (0x7e7090, size=4096, offset=0)

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 345094: WRITE (0x73a4b0, size=4096, offset=0)

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606275: WRITE => 4096/4096,0/4096

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606276: WRITE (0x7e7090, size=4096, offset=4096)

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 345094: WRITE => 4096/4096,0/4096

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 345095: WRITE (0x73a4b0, size=117, offset=4096)

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606276: WRITE => 4096/4096,4096/8192

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606277: WRITE (0x7e7090, size=4096, offset=8192)

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 345095: WRITE => 117/117,4096/4213

2008-03-25 12:56:03 D [fuse-bridge.c:1664:fuse_flush] glusterfs-fuse: 345096: FLUSH 0x73a4b0

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606277: WRITE => 4096/4096,8192/12288

2008-03-25 12:56:03 D [fuse-bridge.c:1640:fuse_write] glusterfs-fuse: 606278: WRITE (0x7e7090, size=2242, offset=12288)

2008-03-25 12:56:03 D [fuse-bridge.c:916:fuse_err_cbk] glusterfs-fuse: 345096: (16) ERR => 0

2008-03-25 12:56:03 D [fuse-bridge.c:1691:fuse_release] glusterfs-fuse: 345097: CLOSE 0x73a4b0

2008-03-25 12:56:03 D [fuse-bridge.c:1603:fuse_writev_cbk] glusterfs-fuse: 606278: WRITE => 2242/2242,12288/14530

2008-03-25 12:56:03 D [fuse-bridge.c:1664:fuse_flush] glusterfs-fuse: 606279: FLUSH 0x7e7090

2008-03-25 12:56:03 D [fuse-bridge.c:916:fuse_err_cbk] glusterfs-fuse: 606279: (16) ERR => 0

2008-03-25 12:56:03 D [fuse-bridge.c:1691:fuse_release] glusterfs-fuse: 606280: CLOSE 0x7e7090

2008-03-25 12:56:03 D [fuse-bridge.c:520:fuse_lookup] glusterfs-fuse: 606281: LOOKUP /img_large/20020814(2147531943)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345098: LOOKUP /img_thumb/20021028/autrefois/1035814963.jpg

2008-03-25 12:56:03 E [protocol.c:271:gf_block_unserialize_transport] sas: EOF from peer (10.6.0.10:6996)

2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sas: cleaning up state in transport object 0x50ce80

2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sas: forced unwinding frame type(1) op(17) reply=@0x2aaaab728270

2008-03-25 12:56:03 E [client-protocol.c:3636:client_close_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:922:fuse_err_cbk] glusterfs-fuse: 345097: (17) ERR => -1 (107)

2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sas: forced unwinding frame type(1) op(34) reply=@0x2aaaab728270

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345098: (34) /img_thumb/20021028/autrefois/1035814963.jpg => -1 (107)

2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sas: connection disconnected

2008-03-25 12:56:03 E [protocol.c:271:gf_block_unserialize_transport] sas: EOF from peer (10.6.0.10:6996)

2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sas: cleaning up state in transport object 0x50ff70

2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sas: connection disconnected

2008-03-25 12:56:03 W [client-protocol.c:4665:client_protocol_cleanup] sata: cleaning up state in transport object 0x50ce80

2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sata: forced unwinding frame type(1) op(17) reply=@0x2aaaac2e77d0

2008-03-25 12:56:03 E [client-protocol.c:3636:client_close_cbk] sata: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:922:fuse_err_cbk] glusterfs-fuse: 606280: (17) ERR => -1 (107)

2008-03-25 12:56:03 E [client-protocol.c:4715:client_protocol_cleanup] sata: forced unwinding frame type(1) op(34) reply=@0x2aaaac2e77d0

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sata: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 606281: (34) /img_large/20020814 => -1 (107)

2008-03-25 12:56:03 E [client-protocol.c:348:client_protocol_xfer] sata: transport_submit failed

2008-03-25 12:56:03 C [tcp.c:87:tcp_disconnect] sata: connection disconnected

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345099: LOOKUP /img_thumb/20021028/babbuinoconte

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345099: (34) /img_thumb/20021028/babbuinoconte => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345100: LOOKUP /img_thumb/20021028/bd

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345100: (34) /img_thumb/20021028/bd => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345101: LOOKUP /img_thumb/20021028/bdelplace

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345101: (34) /img_thumb/20021028/bdelplace => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345102: LOOKUP /img_thumb/20021028/belline

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345102: (34) /img_thumb/20021028/belline => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345103: LOOKUP /img_thumb/20021028/biquet

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345103: (34) /img_thumb/20021028/biquet => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345104: LOOKUP /img_thumb/20021028/botilhon

2008-03-25 12:56:03 D [tcp-client.c:77:tcp_connect] sas: socket fd = 6

2008-03-25 12:56:03 D [tcp-client.c:107:tcp_connect] sas: finalized on port `1023'

2008-03-25 12:56:03 D [tcp-client.c:128:tcp_connect] sas: defaulting remote-port to 6996

2008-03-25 12:56:03 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:10.6.0.10[0] for hostname: 10.6.0.10

2008-03-25 12:56:03 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache

2008-03-25 12:56:03 D [tcp-client.c:161:tcp_connect] sas: connect on 6 in progress (non-blocking)

2008-03-25 12:56:03 E [tcp-client.c:190:tcp_connect] sas: non-blocking connect() returned: 111 (Connection refused)

2008-03-25 12:56:03 W [client-protocol.c:357:client_protocol_xfer] sas: not connected at the moment to submit frame type(1) op(34)

2008-03-25 12:56:03 E [client-protocol.c:4320:client_lookup_cbk] sas: no proper reply from server, returning ENOTCONN

2008-03-25 12:56:03 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 345104: (34) /img_thumb/20021028/botilhon => -1 (107)

2008-03-25 12:56:03 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse: 345105: LOOKUP /img_thumb/20021028/broosguy

Etc …

 

------

 

Debugs of the server:

 

2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15686), lru=1024/1024

2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sas/inode: destroy inode(1610627503) [@0x2aaaaba1c5a0]

2008-03-25 12:56:03 D [inode.c:577:__create_inode] sas/inode: create inode(15687)

2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15687), lru=1024/1024

2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sas/inode: destroy inode(1610627504) [@0x2aaaaba1be00]

2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sata/inode: destroy inode(47547) [@0x2aaaab91c9f0]

2008-03-25 12:56:03 D [inode.c:577:__create_inode] sata/inode: create inode(2147532094)

2008-03-25 12:56:03 D [inode.c:367:__active_inode] sata/inode: activating inode(2147532094), lru=1024/1024

2008-03-25 12:56:03 D [inode.c:321:__destroy_inode] sata/inode: destroy inode(47548) [@0x2aaaab607f20]

2008-03-25 12:56:03 D [inode.c:577:__create_inode] sata/inode: create inode(2147532095)

2008-03-25 12:56:03 D [inode.c:367:__active_inode] sata/inode: activating inode(2147532095), lru=1024/1024

2008-03-25 12:56:03 D [inode.c:577:__create_inode] sas/inode: create inode(15688)

2008-03-25 12:56:03 D [inode.c:367:__active_inode] sas/inode: activating inode(15688), lru=1024/1024

 

TLA Repo Revision: glusterfs--mainline--2.5--patch-717

Time : 2008-03-25 12:56:03

Signal Number : 11

 

glusterfsd -f /etc/glusterfs/server.vol -l /var/log/glusterfs/glusterfsd.log -L DEBUG

volume server

  type protocol/server

  option auth.ip.sata.allow *

  option auth.ip.sas.allow *

  option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1

  option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1

  option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1

  option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1

  option transport-type tcp/server

  subvolumes sas sata

end-volume

 

volume sata

  type performance/io-threads

  option cache-size 64MB

  option thread-count 8

  subvolumes sata-unify

… (see below)…

 

volume sas-ns

  type storage/posix

  option directory /sas/ns

end-volume

 

volume sas-ds

  type storage/posix

  option directory /sas/data

end-volume

 

frame : type(0) op(0)

frame : type(0) op(0)

 

/lib/libc.so.6[0x2af484498110]

/lib/libc.so.6(strlen+0x10)[0x2af4844db580]

/usr/lib/libglusterfs.so.0[0x2af484143458]

/usr/lib/libglusterfs.so.0(mop_lock_impl+0x62)[0x2af4841438e2]

/usr/lib/glusterfs/1.3.8/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd427f]

/usr/lib/glusterfs/1.3.8/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade9c83]

/usr/lib/glusterfs/1.3.8/xlator/performance/io-threads.so[0x2aaaaaef6704]

/usr/lib/libglusterfs.so.0(call_resume+0x78)[0x2af4841451a8]

/usr/lib/glusterfs/1.3.8/xlator/performance/io-threads.so[0x2aaaaaef5c6b]

/lib/libpthread.so.0[0x2af484359f1a]

 

TLA Repo Revision: glusterfs--mainline--2.5--patch-717

Time : 2008-03-25 12:56:03

Signal Number : 11

 

glusterfsd -f /etc/glusterfs/server.vol -l /var/log/glusterfs/glusterfsd.log -L DEBUG

/lib/libc.so.6(__clone+0x72)[0x2af4845325d2]

---------

volume server

  type protocol/server

  option auth.ip.sata.allow *

  option auth.ip.sas.allow *

  option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1

  option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1

  option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1

  option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1

  option transport-type tcp/server

  subvolumes sas sata

end-volume

 

volume sata

  type performance/io-threads

  option cache-size 64MB

  option thread-count 8

  subvolumes sata-unify

end-volume

 

volume sas

  type performance/io-threads

  option cache-size 64MB

  option thread-count 8

  subvolumes sas-unify

end-volume

 

 

-------

 

Doing a gdb traceback:

 

(gdb) backtrace

#0  0x00002af4844db580 in strlen () from /lib/libc.so.6

#1  0x00002af484143458 in place_lock_after (granted=0x2af48424f9e0,

    path=0x2aaaaba37370 "/sas-ds//img_thumb/20021028/autrefois/1035814770.jpg/") at ../../../libglusterfs/src/lock.c:65

#2  0x00002af4841438e2 in mop_lock_impl (frame=0x2aaaaba81c80,

    this_xl=<value optimized out>,

    path=0x2aaaaba64b50 "/sas-ds//img_thumb/20021028/autrefois/1035814770.jpg")

    at ../../../libglusterfs/src/lock.c:102

#3  0x00002aaaaacd427f in afr_close (frame=0x2aaaaba8dab0,

    this=<value optimized out>, fd=0x2aaaab6075f0)

    at ../../../../../xlators/cluster/afr/src/afr.c:3366

#4  0x00002aaaaade9c83 in unify_close (frame=0x2aaaaba72130,

    this=<value optimized out>, fd=0x2aaaab6075f0)

    at ../../../../../xlators/cluster/unify/src/unify.c:2434

#5  0x00002aaaaaef6704 in iot_close_wrapper (frame=0x5696b0, this=0x50de10,

    fd=0x2aaaab6075f0)

    at ../../../../../xlators/performance/io-threads/src/io-threads.c:188

#6  0x00002af4841451a8 in call_resume (stub=0x3638312c33353865)

    at ../../../libglusterfs/src/call-stub.c:2975

#7  0x00002aaaaaef5c6b in iot_worker (arg=<value optimized out>)

    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1024

#8  0x00002af484359f1a in start_thread () from /lib/libpthread.so.0

#9  0x00002af4845325d2 in clone () from /lib/libc.so.6

---Type <return> to continue, or q <return> to quit---

#10 0x0000000000000000 in ?? ()

 

Regards,

Tom

 

 

From: amarts at gmail.com [mailto:amarts at gmail.com] On Behalf Of Amar S. Tumballi
Sent: maandag 24 maart 2008 22:02
To: Tom Myny - Tigron BVBA
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] Troubles with AFR - core dump

 

Hi Tom,
 Thanks for reporting the bug. I believe this bug is fixed in patch-698. and also there is some more enhancement to afr in 'patch-712'. I am in the process of preliminary testing for releasing 1.3.8pre4. You can try that out once its ready or you can choose to use latest TLA which fixes it. 

 We will look into it further and getback to you about the progress. It would be very helpful if you tell the methods which caused this scenario.

Regards,
Amar

On Mon, Mar 24, 2008 at 1:12 PM, Tom Myny - Tigron BVBA <tom.myny at tigron.be> wrote:

Hello all,

I was first using a fresh debian (latest stable - etch) with the
debian packages that are available here ( deb
http://lmello.virt-br.org/debian ./ ...)

But when using them a noticed a crash on one of my servers.

So i noticed that the version that i was using was glusterfs 1.3.8pre1.

After an upgrade to glusterfs 1.3.8pre3 i've got another core dump so
i'm out of ideas for the moment :)

My current (test) set-up

Two server's and a client running the latest debian stable using
glusterfs 1.3.8pre1.

The kernels:

Linux tweety 2.6.18-6-amd64 #1 SMP Sun Feb 10 17:50:19 UTC 2008 x86_64
GNU/Linux

The filessytems are running xfs.

My Config file on one of the servers:

volume sas-ds
        type storage/posix
        option directory /sas/data
end-volume

volume sas-ns
        type storage/posix
        option directory /sas/ns
end-volume

volume sata-ds
        type storage/posix
        option directory /sata/data
end-volume

volume sata-ns
        type storage/posix
        option directory /sata/ns
end-volume

volume sas-backup-ds
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.6.0.8
        option remote-subvolume sas-ds
end-volume

volume sas-backup-ns
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.6.0.8
        option remote-subvolume sas-ns
end-volume

volume sata-backup-ds
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.6.0.8
        option remote-subvolume sata-ds
end-volume

volume sata-backup-ns
        type protocol/client
        option transport-type tcp/client
        option remote-host 10.6.0.8
        option remote-subvolume sata-ns
end-volume

volume sas-ds-afr
        type cluster/afr
        subvolumes sas-ds sas-backup-ds
end-volume

volume sas-ns-afr
        type cluster/afr
        subvolumes sas-ns sas-backup-ns
end-volume

volume sata-ds-afr
        type cluster/afr
        subvolumes sata-ds sata-backup-ds
end-volume

volume sata-ns-afr
        type cluster/afr
        subvolumes sata-ns sata-backup-ns
end-volume

volume sas-unify
        type cluster/unify
        subvolumes sas-ds-afr
        option namespace sas-ns-afr
        option scheduler rr
end-volume

volume sata-unify
        type cluster/unify
        subvolumes sata-ds-afr
        option namespace sata-ns-afr
        option scheduler rr
end-volume

volume sas
        type performance/io-threads
        option thread-count 8
        option cache-size 64MB
        subvolumes sas-unify
end-volume

volume sata
        type performance/io-threads
        option thread-count 8
        option cache-size 64MB
        subvolumes sata-unify
end-volume

volume server
     type protocol/server
     option transport-type tcp/server
     subvolumes sas sata
     option auth.ip.sas-ds.allow 10.6.0.8,127.0.0.1
     option auth.ip.sas-ns.allow 10.6.0.8,127.0.0.1
     option auth.ip.sata-ds.allow 10.6.0.8,127.0.0.1
     option auth.ip.sata-ns.allow 10.6.0.8,127.0.0.1
     option auth.ip.sas.allow *
     option auth.ip.sata.allow *
end-volume

The client is running with the following config file:

volume sas
     type protocol/client
     option transport-type tcp/client     # for TCP/IP transport
     option remote-host 10.6.0.10
     option remote-subvolume sas        # name of the remote volume
end-volume

volume sata
     type protocol/client
     option transport-type tcp/client     # for TCP/IP transport
     option remote-host 10.6.0.10
     option remote-subvolume sata        # name of the remote volume
end-volume

After a wile, when the client is copying data to my server 10.6.0.10
(and this server is replycating with afr nicely to 10.6.0.8) a get the
following message:

TLA Repo Revision: glusterfs--mainline--2.5--patch-704
Time : 2008-03-24 18:36:09
Signal Number : 11

glusterfsd -l /var/log/glusterfs/glusterfsd.log -L WARNING
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)


TLA Repo Revision: glusterfs--mainline--2.5--patch-704
Time : 2008-03-24 18:36:09
Signal Number : 11

glusterfsd -l /var/log/glusterfs/glusterfsd.log -L WARNING
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

/lib/libc.so.6[0x2af9d5c48110]
/lib/libc.so.6[0x2af9d5c48110]
/lib/libc.so.6(strlen+0x30)[0x2af9d5c8b5a0]
/lib/libc.so.6(strlen+0x30)[0x2af9d5c8b5a0]
/usr/lib/libglusterfs.so.0[0x2af9d58f478e]
/usr/lib/libglusterfs.so.0[0x2af9d58f478e]
/usr/lib/libglusterfs.so.0(mop_lock_impl+0x4e)[0x2af9d58f4c1e]
/usr/lib/libglusterfs.so.0(mop_lock_impl+0x4e)[0x2af9d58f4c1e]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd1a2f]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/afr.so(afr_close+0x45f)[0x2aaaaacd1a2f]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade5663]
/usr/lib/glusterfs/1.3.8pre3/xlator/cluster/unify.so(unify_close+0x113)[0x2aaaaade5663]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaef0704]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaef0704]
/usr/lib/libglusterfs.so.0(call_resume+0x6a)[0x2af9d58f63ca]
/usr/lib/libglusterfs.so.0(call_resume+0x6a)[0x2af9d58f63ca]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaeefc6b]
/usr/lib/glusterfs/1.3.8pre3/xlator/performance/io-threads.so[0x2aaaaaeefc6b]
/lib/libpthread.so.0[0x2af9d5b09f1a]
/lib/libpthread.so.0[0x2af9d5b09f1a]
/lib/libc.so.6(__clone+0x72)[0x2af9d5ce25d2]
---------
/lib/libc.so.6(__clone+0x72)[0x2af9d5ce25d2]
---------

Doing a gdb on the coredump:

Core was generated by `[glusterfs]
                              '.
Program terminated with signal 11, Segmentation fault.
#0  0x00002af9d5c8b5a0 in strlen () from /lib/libc.so.6
(gdb) backtrace
#0  0x00002af9d5c8b5a0 in strlen () from /lib/libc.so.6
#1  0x00002af9d58f478e in place_lock_after (granted=0x2af9d59ff9c0,
path=0x2aaaabdbf1e0
"/sata-ds//img_large/20030106/fred33go/1041867234.jpg/") at lock.c:84
#2  0x00002af9d58f4c1e in mop_lock_impl (frame=0x2aaaabdbf180,
this_xl=<value optimized out>, path=0x2aaaabdbf330
"/sata-ds//img_large/20030106/fred33go/1041867234.jpg") at lock.c:118
#3  0x00002aaaaacd1a2f in afr_close (frame=0x2aaaabdb1bb0, this=<value
optimized out>, fd=0x2aaaabb916c0) at afr.c:3676
#4  0x00002aaaaade5663 in unify_close (frame=0x2aaaabdb1d20,
this=<value optimized out>, fd=0x2aaaabb916c0) at unify.c:2384
#5  0x00002aaaaaef0704 in iot_close_wrapper (frame=0x2aaaabb71260,
this=0x50dfb0, fd=0x2aaaabb916c0) at io-threads.c:190
#6  0x00002af9d58f63ca in call_resume (stub=0x0) at call-stub.c:2740
#7  0x00002aaaaaeefc6b in iot_worker (arg=<value optimized out>) at
io-threads.c:1061
#8  0x00002af9d5b09f1a in start_thread () from /lib/libpthread.so.0
#9  0x00002af9d5ce25d2 in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()

If someone has an idea, it's very welcome.

Regards,
Tom




_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel




-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage! 




More information about the Gluster-devel mailing list