[Bugs] [Bug 1790670] New: glusterd crashed when trying to add node
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jan 13 21:14:23 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1790670
Bug ID: 1790670
Summary: glusterd crashed when trying to add node
Product: GlusterFS
Version: 7
Hardware: x86_64
OS: Linux
Status: NEW
Component: glusterd
Severity: urgent
Assignee: bugs at gluster.org
Reporter: pizzi at leopardus.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Trying to add a node to an existing cluster causes glusterd to crash when peer
probe is issued on any of the nodes of the existing cluster.
Tried several times, from all nodes, using both public and private interface
address for probe, to no avail.
Actually the join seem to succeed (peer status on other node says it joined),
but since glusterd crashes, on next execution something is corrupted and
glusterd will not start.
Log of crashing glusterd:
[2020-01-13 20:45:25.057344] I [glusterd.c:1998:init] 0-management:
Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file
not being present, op_version retrieved:0, max op_version: 70000
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 60999
9: option event-threads 1
10: option ping-timeout 0
11: option transport.rdma.listen-port 24008
12: option transport.socket.listen-port 24007
13: option transport.socket.read-fail-log off
14: option transport.socket.keepalive-interval 2
15: option transport.socket.keepalive-time 10
16: option transport-type rdma
17: option working-directory /var/lib/glusterd
18: end-volume
19:
+------------------------------------------------------------------------------+
[2020-01-13 20:45:25.069364] I [MSGID: 101190]
[event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 0
[2020-01-13 20:45:27.338940] I [MSGID: 106487]
[glusterd-handler.c:1339:__glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req
[2020-01-13 20:45:53.609592] I [MSGID: 106163]
[glusterd-handshake.c:1433:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 40100
[2020-01-13 20:45:53.609631] E [MSGID: 101032]
[store.c:493:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
[2020-01-13 20:45:53.609666] I [MSGID: 106477]
[glusterd.c:182:glusterd_uuid_generate_save] 0-management: generated UUID:
a4bc89ed-5100-4c82-942c-2e23126d8fef
[2020-01-13 20:45:53.668763] I [MSGID: 106490]
[glusterd-handler.c:2789:__glusterd_handle_probe_query] 0-glusterd: Received
probe from uuid: 092a6cb9-b90d-4f21-a51d-c74a543e9dd8
[2020-01-13 20:45:53.670396] I [MSGID: 106128]
[glusterd-handler.c:2824:__glusterd_handle_probe_query] 0-glusterd: Unable to
find peerinfo for host:
185-52-0-8.glusterfs-dynamic-98b69362-076a-11e9-b16e-00163cd18fae.default.svc.cluster.local
(24007)
[2020-01-13 20:45:53.677593] W [MSGID: 106061]
[glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2020-01-13 20:45:53.677635] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2020-01-13 20:45:53.680479] I [MSGID: 106498]
[glusterd-handler.c:3470:glusterd_friend_add] 0-management: connect returned 0
[2020-01-13 20:45:53.680536] I [MSGID: 106493]
[glusterd-handler.c:2850:__glusterd_handle_probe_query] 0-glusterd: Responded
to
185-52-0-8.glusterfs-dynamic-98b69362-076a-11e9-b16e-00163cd18fae.default.svc.cluster.local,
op_ret: 0, op_errno: 0, ret: 0
[2020-01-13 20:45:53.681379] I [MSGID: 106490]
[glusterd-handler.c:2434:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 092a6cb9-b90d-4f21-a51d-c74a543e9dd8
[2020-01-13 20:45:53.689395] I [MSGID: 106493]
[glusterd-handler.c:3715:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to
185-52-0-8.glusterfs-dynamic-98b69362-076a-11e9-b16e-00163cd18fae.default.svc.cluster.local
(0), ret: 0, op_ret: 0
[2020-01-13 20:45:53.723341] I [MSGID: 106511]
[glusterd-rpc-ops.c:250:__glusterd_probe_cbk] 0-management: Received probe resp
from uuid: 092a6cb9-b90d-4f21-a51d-c74a543e9dd8, host:
185-52-0-8.glusterfs-dynamic-98b69362-076a-11e9-b16e-00163cd18fae.default.svc.cluster.local
[2020-01-13 20:45:53.723376] I [MSGID: 106511]
[glusterd-rpc-ops.c:403:__glusterd_probe_cbk] 0-glusterd: Received resp to
probe req
[2020-01-13 20:45:53.740791] I [MSGID: 106493]
[glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC
from uuid: 092a6cb9-b90d-4f21-a51d-c74a543e9dd8, host:
185-52-0-8.glusterfs-dynamic-98b69362-076a-11e9-b16e-00163cd18fae.default.svc.cluster.local,
port: 0
[2020-01-13 20:45:53.746122] I [MSGID: 106492]
[glusterd-handler.c:2619:__glusterd_handle_friend_update] 0-glusterd: Received
friend update from uuid: 092a6cb9-b90d-4f21-a51d-c74a543e9dd8
[2020-01-13 20:45:53.751125] W [MSGID: 106061]
[glusterd-handler.c:3315:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2020-01-13 20:45:53.751174] I [rpc-clnt.c:1014:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2020-01-13 20:45:53.753764] I [MSGID: 106498]
[glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-01-13 20:45:53
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 7.1
Sometimes a stack trace is printed:
/lib64/libglusterfs.so.0(+0x277ff)[0x7f19e70437ff]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f19e704e234]
/lib64/libc.so.6(+0x363b0)[0x7f19e56843b0]
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x26c42)[0x7f19e134cc42]
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x279ae)[0x7f19e134d9ae]
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x2840f)[0x7f19e134e40f]
/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x2362e)[0x7f19e134962e]
/lib64/libgfrpc.so.0(+0x9695)[0x7f19e6de7695]
/lib64/libgfrpc.so.0(+0x9a0b)[0x7f19e6de7a0b]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f19e6de9a93]
/usr/lib64/glusterfs/7.1/rpc-transport/socket.so(+0x4468)[0x7f19e0564468]
/usr/lib64/glusterfs/7.1/rpc-transport/socket.so(+0xb861)[0x7f19e056b861]
/lib64/libglusterfs.so.0(+0x8e246)[0x7f19e70aa246]
/lib64/libpthread.so.0(+0x7e65)[0x7f19e5e86e65]
/lib64/libc.so.6(clone+0x6d)[0x7f19e574c88d]
If I restart glusterd, it will abort due to some corruption:
[2020-01-13 21:02:53.619924] I [MSGID: 100030] [glusterfsd.c:2867:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 7.1 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2020-01-13 21:02:53.620277] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid
of current running process is 80
[2020-01-13 21:02:53.622166] I [MSGID: 106478] [glusterd.c:1426:init]
0-management: Maximum allowed open file descriptors set to 65536
[2020-01-13 21:02:53.622193] I [MSGID: 106479] [glusterd.c:1482:init]
0-management: Using /var/lib/glusterd as working directory
[2020-01-13 21:02:53.622201] I [MSGID: 106479] [glusterd.c:1488:init]
0-management: Using /var/run/gluster as pid file working directory
[2020-01-13 21:02:53.625927] I [socket.c:1014:__socket_server_bind]
0-socket.management: process started listening on port (24007)
[2020-01-13 21:02:53.627074] W [MSGID: 103071]
[rdma.c:4472:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel
creation failed [No such device]
[2020-01-13 21:02:53.627092] W [MSGID: 103055] [rdma.c:4782:init]
0-rdma.management: Failed to initialize IB Device
[2020-01-13 21:02:53.627101] W [rpc-transport.c:366:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2020-01-13 21:02:53.627171] W [rpcsvc.c:1981:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2020-01-13 21:02:53.627179] E [MSGID: 106244] [glusterd.c:1781:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2020-01-13 21:02:53.628336] I [socket.c:957:__socket_server_bind]
0-socket.management: closing (AF_UNIX) reuse check socket 12
[2020-01-13 21:02:54.544119] I [MSGID: 106513]
[glusterd-store.c:2257:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 40100
[2020-01-13 21:02:54.544330] I [MSGID: 106498]
[glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
[2020-01-13 21:02:54.544595] E
[glusterd-handler.c:3275:glusterd_transport_inet_options_build]
(-->/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x8aebe) [0x7fd0ac39eebe]
-->/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x26c7a) [0x7fd0ac33ac7a]
-->/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x26b56) [0x7fd0ac33ab56]
) 0-: Assertion failed: hostname
[2020-01-13 21:02:54.544620] E
[rpc-transport.c:655:rpc_transport_inet_options_build]
(-->/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x26c7a) [0x7fd0ac33ac7a]
-->/usr/lib64/glusterfs/7.1/xlator/mgmt/glusterd.so(+0x267fc) [0x7fd0ac33a7fc]
-->/lib64/libgfrpc.so.0(rpc_transport_inet_options_build+0x2b6)
[0x7fd0b1dd8156] ) 0-: Assertion failed: hostname
The message "I [MSGID: 106498]
[glusterd-handler.c:3519:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0" repeated 2 times between [2020-01-13 21:02:54.544330] and
[2020-01-13 21:02:54.544420]
[2020-01-13 21:02:54.544637] E [MSGID: 101019] [xlator.c:629:xlator_init]
0-management: Initialization of volume 'management' failed, review your volfile
again
[2020-01-13 21:02:54.544649] E [MSGID: 101066]
[graph.c:425:glusterfs_graph_init] 0-management: initializing translator failed
[2020-01-13 21:02:54.544654] E [MSGID: 101176]
[graph.c:779:glusterfs_graph_activate] 0-graph: init failed
[2020-01-13 21:02:54.544729] W [glusterfsd.c:1596:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x55b5b2bbc19d]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x21d) [0x55b5b2bbc08d]
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55b5b2bbb48b] ) 0-: received
signum (-1), shutting down
[2020-01-13 21:02:54.544764] W [mgmt-pmap.c:132:rpc_clnt_mgmt_pmap_signout]
0-glusterfs: failed to create XDR payload
firewall is open (all the 3 nodes have an ACCEPT rule for any port and protocol
in iptables)
Version-Release number of selected component (if applicable):
glusterfs-4.1.6-1.el7.x86_64
glusterfs-fuse-4.1.6-1.el7.x86_64
glusterfs-libs-4.1.6-1.el7.x86_64
glusterfs-client-xlators-4.1.6-1.el7.x86_64
Also tried with latest (4.1.7), issue is the same
How reproducible:
It happens on a new node that I am trying to add to cluster.
Did not happen on an identical previous node!
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list