[Bugs] [Bug 1353561] Multiple bricks could crash after TCP port probing

Tue Jul 19 07:40:46 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1353561

--- Comment #3 from Oleksandr Natalenko <oleksandr at natalenko.name> ---
Another bunch of cores I've got during extra tests.

=== core.27675 ===
#0  0x00007fb97754abd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007fb9786e36fd in gf_log_set_log_buf_size (buf_size=buf_size at entry=0)
at logging.c:256
#2  0x00007fb9786e3897 in gf_log_disable_suppression_before_exit
(ctx=0x7fb97a3f1010) at logging.c:428
#3  0x00007fb978705915 in gf_print_trace (signum=11, ctx=0x7fb97a3f1010) at
common-utils.c:579
#4  <signal handler called>
#5  0x00007fb978709449 in __inode_ctx_free (inode=inode at entry=0x7fb933ed16d8)
at inode.c:336
#6  0x00007fb97870a4c7 in __inode_destroy (inode=0x7fb933ed16d8) at inode.c:358
#7  inode_table_prune (table=table at entry=0x7fb9640ce160) at inode.c:1540
#8  0x00007fb97870a754 in inode_unref (inode=0x7fb933ed16d8) at inode.c:529
#9  0x00007fb9786de152 in loc_wipe (loc=loc at entry=0x7fb975c91e14) at
xlator.c:695
#10 0x00007fb97871c794 in call_stub_wipe_args (stub=0x7fb975c91dd4) at
call-stub.c:2511
#11 call_stub_destroy (stub=0x7fb975c91dd4) at call-stub.c:2550
#12 0x00007fb968eb4363 in iot_worker (data=0x7fb964055bf0) at io-threads.c:215
#13 0x00007fb977548dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fb976e8dced in clone () from /lib64/libc.so.6
===

=== core.8362 ===
#0  0x00007f8613718bdd in __gf_free (free_ptr=0x7f86000ebbd0) at mem-pool.c:313
#1  0x00007f85fef6ffa5 in io_stats_release (this=0x7f860001dc70,
fd=0x7f85f8003d14) at io-stats.c:2540
#2  0x00007f86137166c7 in fd_destroy (bound=_gf_true, fd=0x7f85f8003d14) at
fd.c:520
#3  fd_unref (fd=0x7f85f8003d14) at fd.c:573
#4  0x00007f85fed3f227 in server3_3_release (req=0x7f85fdfba06c) at
server-rpc-fops.c:4030
#5  0x00007f86134a16ab in rpcsvc_handle_rpc_call (svc=0x7f860002f8b0,
trans=trans at entry=0x7f86000d9d80, msg=0x7f85f8038120) at rpcsvc.c:705
#6  0x00007f86134a187b in rpcsvc_notify (trans=0x7f86000d9d80,
mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
rpcsvc.c:799
#7  0x00007f86134a37c3 in rpc_transport_notify (this=this at entry=0x7f86000d9d80,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f85f8038120)
at rpc-transport.c:546
#8  0x00007f860836a654 in socket_event_poll_in (this=this at entry=0x7f86000d9d80)
at socket.c:2245
#9  0x00007f860836d294 in socket_event_handler (fd=fd at entry=20,
idx=idx at entry=9, data=0x7f86000d9d80, poll_in=1, poll_out=0, poll_err=0) at
socket.c:2358
#10 0x00007f86137481ca in event_dispatch_epoll_handler (event=0x7f85fed21e80,
event_pool=0x7f8614d02290) at event-epoll.c:575
#11 event_dispatch_epoll_worker (data=0x7f86000205d0) at event-epoll.c:678
#12 0x00007f8612541dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f8611e86ced in clone () from /lib64/libc.so.6
===

=== core.8420 ===
#0  0x00007f1404043fa8 in ?? ()
#1  0x00007f141f83c7c3 in rpc_transport_notify (this=this at entry=0x7f14040434f0,
event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x7f14040434f0) at
rpc-transport.c:546
#2  0x00007f1414706242 in socket_event_poll_err (this=0x7f14040434f0) at
socket.c:1158
#3  socket_event_handler (fd=fd at entry=25, idx=idx at entry=10,
data=0x7f14040434f0, poll_in=1, poll_out=0, poll_err=<optimized out>) at
socket.c:2365
#4  0x00007f141fae11ca in event_dispatch_epoll_handler (event=0x7f140b150e80,
event_pool=0x7f1421eb6290) at event-epoll.c:575
#5  event_dispatch_epoll_worker (data=0x7f140c0205d0) at event-epoll.c:678
#6  0x00007f141e8dadc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f141e21fced in clone () from /lib64/libc.so.6
===

=== core.8439 ===
#0  0x00007f3a7b79bc10 in list_del_init (old=0x7f3a74027e18) at
../../../../libglusterfs/src/list.h:88
#1  server_rpc_notify (rpc=<optimized out>, xl=0x7f3a7c01f1e0, event=<optimized
out>, data=0x7f3a74027360) at server.c:542
#2  0x00007f3a8ffec4bf in rpcsvc_handle_disconnect (svc=0x7f3a7c02f8b0,
trans=trans at entry=0x7f3a74027360) at rpcsvc.c:756
#3  0x00007f3a8ffee830 in rpcsvc_notify (trans=0x7f3a74027360,
mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at
rpcsvc.c:794
#4  0x00007f3a8fff07c3 in rpc_transport_notify (this=this at entry=0x7f3a74027360,
event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x7f3a74027360) at
rpc-transport.c:546
#5  0x00007f3a84eba242 in socket_event_poll_err (this=0x7f3a74027360) at
socket.c:1158
#6  socket_event_handler (fd=fd at entry=27, idx=idx at entry=10,
data=0x7f3a74027360, poll_in=1, poll_out=0, poll_err=<optimized out>) at
socket.c:2365
#7  0x00007f3a902951ca in event_dispatch_epoll_handler (event=0x7f3a831dce80,
event_pool=0x7f3a90eb6290) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f3a90f02d70) at event-epoll.c:678
#9  0x00007f3a8f08edc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3a8e9d3ced in clone () from /lib64/libc.so.6
===

Attaching xzipped core files to this ticket.

Please note that this cores were got running v3.7.13 with the following patches
applied:

===
Kaleb S KEITHLEY (1):
      build: RHEL7 unpackaged files
.../hooks/S57glusterfind-delete-post.{pyc,pyo}

Kotresh HR (1):
      changelog/rpc: Fix rpc_clnt_t mem leaks

Mohit Agrawal (1):
      rpc/socket.c: Modify approach to cleanup threads of socket_poller in
socket_spawn.

N Balachandran (1):
      rpc/socket: pthread resources are not cleaned up

Niels de Vos (1):
      rpc: invalid argument when function setsockopt sets option
TCP_USER_TIMEOUT

Pranith Kumar K (1):
      features/index: Exclude gfid-type for '.', '..'

Raghavendra G (2):
      libglusterfs/client_t: Dump the 0th client too
      storage/posix: fix inode leaks
===

Also, please, note, that core.27675 is the coredump of brick of the volume with
following options:

===
nfs.disable: off
network.ping-timeout: 10
storage.linux-aio: on
server.event-threads: 4
performance.write-behind: on
performance.write-behind-window-size: 4194304
performance.stat-prefetch: on
performance.read-ahead: on
performance.quick-read: on
performance.io-thread-count: 2
performance.flush-behind: on
performance.client-io-threads: off
performance.cache-size: 33554432
performance.cache-max-file-size: 1048576
network.inode-lru-limit: 4096
features.cache-invalidation: on
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
cluster.data-self-heal: off
cluster.readdir-optimize: on
cluster.lookup-optimize: on
client.event-threads: 2
performance.readdir-ahead: on
===

Rest of cores are obtained from bricks of the volume with default options.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=v59uWELiOR&a=cc_unsubscribe