[Bugs] [Bug 1790670] glusterd crashed when trying to add node

Tue Jan 14 09:57:17 UTC 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1790670

--- Comment #7 from Rick Pizzi <pizzi at leopardus.com> ---
Situation:

kubectl describe pod glusterfs-vsfbr   -n kube-system | grep memory
      memory:   900Mi

[2020-01-14 09:50:45.536179] I [MSGID: 100030] [glusterfsd.c:2867:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 7.1 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) 
[2020-01-14 09:50:45.536653] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid
of current running process is 79
[2020-01-14 09:50:45.538261] I [MSGID: 106478] [glusterd.c:1426:init]
0-management: Maximum allowed open file descriptors set to 65536 
[2020-01-14 09:50:45.538300] I [MSGID: 106479] [glusterd.c:1482:init]
0-management: Using /var/lib/glusterd as working directory 
[2020-01-14 09:50:45.538403] I [MSGID: 106479] [glusterd.c:1488:init]
0-management: Using /var/run/gluster as pid file working directory 
[2020-01-14 09:50:45.541895] I [socket.c:1014:__socket_server_bind]
0-socket.management: process started listening on port (24007)
[2020-01-14 09:50:45.543039] W [MSGID: 103071]
[rdma.c:4472:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel
creation failed [No such device]
[2020-01-14 09:50:45.543076] W [MSGID: 103055] [rdma.c:4782:init]
0-rdma.management: Failed to initialize IB Device 
[2020-01-14 09:50:45.543091] W [rpc-transport.c:366:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2020-01-14 09:50:45.543181] W [rpcsvc.c:1981:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2020-01-14 09:50:45.543197] E [MSGID: 106244] [glusterd.c:1781:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport 
[2020-01-14 09:50:45.544310] I [socket.c:957:__socket_server_bind]
0-socket.management: closing (AF_UNIX) reuse check socket 12
[2020-01-14 09:50:46.430992] I [MSGID: 106513]
[glusterd-store.c:2257:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100 
[2020-01-14 09:50:46.432660] I [MSGID: 106498]
[glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0 
The message "I [MSGID: 106498]
[glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0" repeated 2 times between [2020-01-14 09:50:46.432660] and
[2020-01-14 09:50:46.432760]
[2020-01-14 09:50:46.432800] W [MSGID: 106061]
[glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout 
[2020-01-14 09:50:46.432823] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2020-01-14 09:50:46.437859] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2020-01-14 09:50:46.442293] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
The message "W [MSGID: 106061]
[glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout" repeated 2 times between [2020-01-14
09:50:46.432800] and [2020-01-14 09:50:46.442281]
[2020-01-14 09:50:46.450965] I [MSGID: 101190]
[event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 0 
[2020-01-14 09:50:46.451744] I [MSGID: 106544]
[glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID:
bc728fa4-f496-43f0-8974-9def63e989c3 
[2020-01-14 09:50:46.453163] I [MSGID: 106163]
[glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 40100 
[2020-01-14 09:50:46.530141] I [MSGID: 106490]
[glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: d120bc11-e32d-42f5-bb3d-d9feefab396d 
[2020-01-14 09:50:46.535728] I [MSGID: 106493]
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to nl2.leopardus.com (0), ret: 0, op_ret: 0 
[2020-01-14 09:50:46.543698] I [MSGID: 106493]
[glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC
from uuid: d120bc11-e32d-42f5-bb3d-d9feefab396d, host: nl2.leopardus.com, port:
0 
[2020-01-14 09:50:46.547771] I [MSGID: 106492]
[glusterd-handler.c:2619:__glusterd_handle_friend_update] 0-glusterd: Received
friend update from uuid: d120bc11-e32d-42f5-bb3d-d9feefab396d 
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2020-01-14 09:50:46
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 7.1

Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f9de00d60b8 in ?? () from /lib64/libgcc_s.so.1
Missing separate debuginfos, use: debuginfo-install
glusterfs-server-7.1-1.el7.x86_64
(gdb) bt
#0  0x00007f9de00d60b8 in ?? () from /lib64/libgcc_s.so.1
#1  0x00007f9de00d6fb9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007f9de82a9a56 in backtrace () from /lib64/libc.so.6
#3  0x00007f9de9b8a7ff in _gf_msg_backtrace_nomem () from
/lib64/libglusterfs.so.0
#4  0x00007f9de9b95234 in gf_print_trace () from /lib64/libglusterfs.so.0
#5  <signal handler called>
#6  0x00007f9de00d60b8 in ?? () from /lib64/libgcc_s.so.1
#7  0x00007f9de00d6fb9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#8  0x00007f9de82a9a56 in backtrace () from /lib64/libc.so.6
#9  0x00007f9de820dea4 in __libc_message () from /lib64/libc.so.6
#10 0x00007f9de82ad527 in __fortify_fail () from /lib64/libc.so.6
#11 0x00007f9de82ad4e2 in __stack_chk_fail () from /lib64/libc.so.6
#12 0x00007f9de3ef7ab5 in glusterd_store_perform_peer_store () from
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so
#13 0x73756c672e30312d in ?? ()
#14 0x79642d7366726574 in ?? ()
#15 0x38392d63696d616e in ?? ()
#16 0x302d323633393662 in ?? ()
#17 0x396531312d613637 in ?? ()
#18 0x30302d653631622d in ?? ()
#19 0x6638316463333631 in ?? ()
#20 0x75616665642e6561 in ?? ()
#21 0x632e6376732e746c in ?? ()
#22 0x6c2e72657473756c in ?? ()
#23 0x0000000a6c61636f in ?? ()
#24 0x00007f9de3fa6c08 in ?? () from
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so
#25 0x00007f9de0cde290 in ?? ()
#26 0x0000000000000000 in ?? ()
(gdb) 

So, I do see the "nomem" function you have mentioned in frame #3, but the crash
is actually happenint much more up the chain in frame #11, and it is a stack
smash
check that is detected. Anyway, it has about 1G of memory and still happens.
Please help.

-- 
You are receiving this mail because:
You are on the CC list for the bug.