[Bugs] [Bug 1354250] New: Gluster fuse client crashed generating core dump
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jul 11 04:21:47 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1354250
Bug ID: 1354250
Summary: Gluster fuse client crashed generating core dump
Product: GlusterFS
Version: 3.8.0
Component: transport
Severity: medium
Priority: medium
Assignee: bugs at gluster.org
Reporter: nbalacha at redhat.com
CC: bkunal at redhat.com, bugs at gluster.org, csaba at redhat.com,
rhs-bugs at redhat.com, storage-qa-internal at redhat.com
Depends On: 1343320, 1343374
+++ This bug was initially created as a clone of Bug #1343374 +++
This bug was initially created as a clone of Bug #1343320 +++
Description of problem:
Client crash with core dump due to excessive memory consumption
Version-Release number of selected component (if applicable):
3.7.5-19.el7rhgs.x86_64
RHEL 5
Additional info:
lots of DNS resolution error found in client logs
The following logs
I can see continuous error messages :
[2016-04-27 10:33:29.833969] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:32.843124] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:35.850581] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:38.858181] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:41.865251] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
The message "E [MSGID: 101075] [common-utils.c:306:gf_resolve_ip6] 0-resolver:
getaddrinfo failed (Name or service not known)" repeated 39 times between
[2016-04-27 10:31:44.561995] and [2016-04-27 10:33:41.865245]
[2016-04-27 10:33:44.873510] E [MSGID: 101075]
[common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or
service not known)
[2016-04-27 10:33:44.873599] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:47.881687] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-27 10:33:50.890768] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
.
.
.
.
.
and after sometime(almost after 27 hour) :
[2016-04-28 13:47:23.002272] E [socket.c:3124:socket_connect] 0-vol01-client-1:
pthread_createfailed: Cannot allocate memory
[2016-04-28 13:47:23.002528] E [socket.c:3126:socket_connect]
(-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0xf5) [0x3bb8046e65]
-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xea) [0x3bb7c0e67a]
-->/usr/lib64/glusterfs/3.7.5/rpc-transport/socket.so [0x2b2a7319f0ca] ) 0-:
Assertion failed: 0
[2016-04-28 13:47:26.008762] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
[2016-04-28 13:47:26.008933] E [socket.c:3124:socket_connect] 0-vol01-client-1:
pthread_createfailed: Cannot allocate memory
[2016-04-28 13:47:26.009134] E [socket.c:3126:socket_connect]
(-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0xf5) [0x3bb8046e65]
-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xea) [0x3bb7c0e67a]
-->/usr/lib64/glusterfs/3.7.5/rpc-transport/socket.so [0x2b2a7319f0ca] ) 0-:
Assertion failed: 0
[2016-04-28 13:47:29.015862] E [name.c:242:af_inet_client_get_remote_sockaddr]
0-vol01-client-1: DNS resolution failed on host server1
.
.This continued for almost a week
.
.
.Followed by
.
.
.
[2016-05-15 04:12:17.272132] A [MSGID: 0] [mem-pool.c:120:__gf_calloc] : no
memory available for size (2097224) [call stack follows]
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb5)[0x3bb8025395]
/usr/lib64/libglusterfs.so.0(_gf_msg_nomem+0x42e)[0x3bb802984e]
/usr/lib64/libglusterfs.so.0(__gf_calloc+0x100)[0x3bb805bda0]
/usr/lib64/libglusterfs.so.0(synctask_create+0x3a1)[0x3bb806cf21]
/usr/lib64/libglusterfs.so.0(synctask_new1+0x9)[0x3bb806d4f9]
[2016-05-15 04:12:18.863904] A [MSGID: 0] [mem-pool.c:120:__gf_calloc] : no
memory available for size (2097224) [call stack follows]
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb5)[0x3bb8025395]
/usr/lib64/libglusterfs.so.0(_gf_msg_nomem+0x42e)[0x3bb802984e]
/usr/lib64/libglusterfs.so.0(__gf_calloc+0x100)[0x3bb805bda0]
/usr/lib64/libglusterfs.so.0(synctask_create+0x3a1)[0x3bb806cf21]
/usr/lib64/libglusterfs.so.0(synctask_new1+0x9)[0x3bb806d4f9]
.
.
.
.
.
.
[2016-05-15 04:12:31.572526] A [MSGID: 0] [mem-pool.c:120:__gf_calloc] : no
memory available for size (124) [call stack follows]
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb5)[0x3bb8025395]
/usr/lib64/libglusterfs.so.0(_gf_msg_nomem+0x42e)[0x3bb802984e]
/usr/lib64/libglusterfs.so.0(__gf_calloc+0x100)[0x3bb805bda0]
/usr/lib64/libglusterfs.so.0(mem_get+0xb8)[0x3bb805be98]
/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)[0x3bb805bf0b]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
.
.
.
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-05-15 04:12:31
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb5)[0x3bb8025395]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x338)[0x3bb8042378]
/lib64/libc.so.6[0x34f2030030]
/usr/lib64/libglusterfs.so.0(mem_get+0x6e)[0x3bb805be4e]
/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)[0x3bb805bf0b]
/usr/lib64/libglusterfs.so.0(get_new_data+0x20)[0x3bb801f260]
/usr/lib64/libglusterfs.so.0(dict_unserialize+0xf4)[0x3bb801f374]
/usr/lib64/glusterfs/3.7.5/xlator/protocol/client.so(client3_3_lookup_cbk+0x7bc)[0x2b2a741f5acc]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa0)[0x3bb7c0fa70]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1b4)[0x3bb7c0fd34]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x3bb7c0b517]
/usr/lib64/glusterfs/3.7.5/rpc-transport/socket.so[0x2b2a731a3f68]
/usr/lib64/glusterfs/3.7.5/rpc-transport/socket.so[0x2b2a731a4994]
/usr/lib64/libglusterfs.so.0[0x3bb808b363]
/lib64/libpthread.so.0[0x34f280683d]
/lib64/libc.so.6(clone+0x6d)[0x34f20d4fcd]
--- Additional comment from Nithya Balachandran on 2016-06-07 04:50:10 EDT ---
RCA:
There is a memory leak in the socket_connect code in case of failure.
In socket_connect ():
/* if sock != -1, then cleanup is done from the event handler */
if (ret == -1 && sock == -1) {
/* Cleaup requires to send notification to upper layer which
intern holds the big_lock. There can be dead-lock situation
if big_lock is already held by the current thread.
So transfer the ownership to seperate thread for cleanup.
*/
arg = GF_CALLOC (1, sizeof (*arg),
gf_sock_connect_error_state_t);
arg->this = THIS;
arg->trans = this;
arg->refd = refd;
th_ret = pthread_create (&th_id, NULL,
socket_connect_error_cbk,
arg);
if (th_ret) {
gf_log (this->name, GF_LOG_ERROR, "pthread_create"
"failed: %s", strerror(errno));
GF_FREE (arg);
GF_ASSERT (0);
}
}
pthread_create does not create a detached thread so the thread resources are
not cleaned up. socket_connect is called at 3 second intervals so this quickly
adds up causing the process to run out of memory.
--- Additional comment from Vijay Bellur on 2016-06-07 04:56:31 EDT ---
REVIEW: http://review.gluster.org/14661 (rpc/socket: pthread resources are not
cleanup up) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)
--- Additional comment from Nithya Balachandran on 2016-06-07 05:01:17 EDT ---
Fix:
Create a detached thread so all thread resources are cleaned up automatically.
--- Additional comment from Vijay Bellur on 2016-06-07 05:09:38 EDT ---
REVIEW: http://review.gluster.org/14661 (rpc/socket: pthread resources are not
cleaned up) posted (#2) for review on master by N Balachandran
(nbalacha at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-08 01:18:43 EDT ---
REVIEW: http://review.gluster.org/14875 (rpc/socket: pthread resources are not
cleaned up) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-08 01:22:40 EDT ---
REVIEW: http://review.gluster.org/14875 (rpc/socket: pthread resources are not
cleaned up) posted (#2) for review on master by N Balachandran
(nbalacha at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-08 01:54:26 EDT ---
REVIEW: http://review.gluster.org/14875 (rpc/socket: pthread resources are not
cleaned up) posted (#3) for review on master by N Balachandran
(nbalacha at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-08 16:17:16 EDT ---
COMMIT: http://review.gluster.org/14875 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit 9886d568a7a8839bf3acc81cb1111fa372ac5270
Author: N Balachandran <nbalacha at redhat.com>
Date: Fri Jul 8 10:46:46 2016 +0530
rpc/socket: pthread resources are not cleaned up
A socket_connect failure creates a new pthread which
is not a detached thread. As no pthread_join is called,
the thread resources are not cleaned up causing a memory leak.
Now, socket_connect creates a detached thread to handle failure.
Change-Id: Idbf25d312f91464ae20c97d501b628bfdec7cf0c
BUG: 1343374
Signed-off-by: N Balachandran <nbalacha at redhat.com>
Reviewed-on: http://review.gluster.org/14875
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1343320
[Bug 1343320] [GSS] Gluster fuse client crashed generating core dump
https://bugzilla.redhat.com/show_bug.cgi?id=1343374
[Bug 1343374] Gluster fuse client crashed generating core dump
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list