[Bugs] [Bug 1471870] New: cthon04 can cause segfault in gNFS/NLM
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jul 17 14:50:10 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1471870
Bug ID: 1471870
Summary: cthon04 can cause segfault in gNFS/NLM
Product: GlusterFS
Version: 3.10
Component: nfs
Keywords: Triaged
Severity: urgent
Priority: medium
Assignee: ndevos at redhat.com
Reporter: ndevos at redhat.com
CC: bugs at gluster.org
Depends On: 1467313
+++ This bug was initially created as a clone of Bug #1467313 +++
Description of problem:
While running cthon04 tests against Gluster/NFS, the following crash was
observed (RHGS backports gnfs/nlm fixes to 3.8.4):
ify?! [Invalid argument]
[2017-06-19 13:08:46.117375] W [socket.c:595:__socket_rwv] 0-NLM-client: readv
on 10.70.37.142:34033 failed (No data available)
[2017-06-19 13:08:46.117529] W [socket.c:595:__socket_rwv] 0-NLM-client: readv
on 10.70.37.142:34033 failed (No data available)
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-06-19 13:08:48
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f6ec83b54b2]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f6ec83befe4]
/lib64/libc.so.6(+0x35270)[0x7f6ec6a1e270]
/lib64/libc.so.6(+0x165921)[0x7f6ec6b4e921]
/usr/lib64/glusterfs/3.8.4/xlator/nfs/server.so(+0x3f9aa)[0x7f6eba13a9aa]
/usr/lib64/glusterfs/3.8.4/xlator/nfs/server.so(+0x42349)[0x7f6eba13d349]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x214)[0x7f6ec817eb54]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f6ec817a9e3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x51d7)[0x7f6ebcfa71d7]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9918)[0x7f6ebcfab918]
/lib64/libglusterfs.so.0(+0x849d6)[0x7f6ec840f9d6]
/lib64/libpthread.so.0(+0x7e25)[0x7f6ec7214e25]
/lib64/libc.so.6(clone+0x6d)[0x7f6ec6ae134d]
Version-Release number of selected component (if applicable):
mainline (reported against RHGS with glusterfs-3.8.4 w/ backports)
How reproducible:
Run cthon04 tests against Gluster/NFS. When using EC volumes the problem
hit-ration is highest.
Steps to Reproduce:
1. configure a Gluster volume
2. on a nfs-client (nor part of the TSP)
1. git clone git://git.linux-nfs.org/projects/steved/cthon04.git
2. compile the tests and make sure dependencies are installed
3. run like
# mount -t nfs -o vers=3 vm015.example.com:/one-brick /mnt/nfsv3
# ./server -a -p /one-brick -m /mnt/nfsv3 vm015.example.com
Actual results:
Occasional, but regularly segfaults of Gluster/NFS.
Expected results:
No segfaults (duh!) and passing of the crhon04 tests.
Additional info:
--- Additional comment from Worker Ant on 2017-07-04 22:04:34 CEST ---
REVIEW: https://review.gluster.org/17696 (nfs: make nfs3_call_state_t
refcounted) posted (#1) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-04 22:04:38 CEST ---
REVIEW: https://review.gluster.org/17697 (nfs/nlm: unref fds in
nlm_client_free()) posted (#1) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-04 22:04:49 CEST ---
REVIEW: https://review.gluster.org/17698 (nfs/nlm: handle reconnect for
non-NLM4_LOCK requests) posted (#1) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-04 22:04:57 CEST ---
REVIEW: https://review.gluster.org/17699 (nfs/nlm: use refcounting for
nfs3_call_state_t) posted (#1) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-04 22:05:03 CEST ---
REVIEW: https://review.gluster.org/17700 (nfs/nlm: keep track of the call-state
and frame for notifications) posted (#1) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-06 14:22:24 CEST ---
COMMIT: https://review.gluster.org/17696 committed in master by Niels de Vos
(ndevos at redhat.com)
------
commit daed52b8ebcac7ef36f11e944f83826f46593867
Author: Niels de Vos <ndevos at redhat.com>
Date: Fri Jun 23 10:01:27 2017 +0200
nfs: make nfs3_call_state_t refcounted
There is no refcounting done of the nfs3_call_state_t structure, which
seems to result in use-after-free problems in the NLM part of
Gluster/NFS. The structure is initialized with two different functions,
it is easier to have a single place to do this.
The Gluster/NFS part will not use the refcounting, for now. This is
being added to make the NLM code more stable. nfs3_call_state_wipe()
will behave as before for Gluster/NFS, but cleanup is triggered through
the refcounting now. This prevents major changes to the stable part of
the NFS-server, and makes it possible to improve the NLM component
separately.
Change-Id: I2e15bcf12af74e8a46c2727e4a160e9444d29ece
BUG: 1467313
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: https://review.gluster.org/17696
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Amar Tumballi <amarts at redhat.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle at redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>
--- Additional comment from Worker Ant on 2017-07-06 14:22:38 CEST ---
COMMIT: https://review.gluster.org/17697 committed in master by Niels de Vos
(ndevos at redhat.com)
------
commit e9a482f94e748ea12e73ddd2e275bad9aa314b4c
Author: Niels de Vos <ndevos at redhat.com>
Date: Fri Jun 30 17:54:34 2017 +0200
nfs/nlm: unref fds in nlm_client_free()
When a nlm_clnt is getting free'd, the FDs associated with this client
should be unref'd as well.
Change-Id: Ifa4ea4b7ed45a454413cfc0c820f2516c534a9aa
BUG: 1467313
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: https://review.gluster.org/17697
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Amar Tumballi <amarts at redhat.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle at redhat.com>
--- Additional comment from Worker Ant on 2017-07-07 10:56:34 CEST ---
REVIEW: https://review.gluster.org/17699 (nfs/nlm: use refcounting for
nfs3_call_state_t) posted (#2) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-07 12:17:06 CEST ---
REVIEW: https://review.gluster.org/17698 (nfs/nlm: handle reconnect for
non-NLM4_LOCK requests) posted (#2) for review on master by Niels de Vos
(ndevos at redhat.com)
--- Additional comment from Worker Ant on 2017-07-09 11:13:02 CEST ---
COMMIT: https://review.gluster.org/17698 committed in master by Niels de Vos
(ndevos at redhat.com)
------
commit fafe1491ead527ba1024c521013aa90d2ee2b355
Author: Niels de Vos <ndevos at redhat.com>
Date: Wed Jun 21 16:25:33 2017 +0200
nfs/nlm: handle reconnect for non-NLM4_LOCK requests
When a reply on an NLM-procedure gets stuck, the NFS-client will resend
the request. This can happen through a re-connect in case the connection
was terminated (long delay in the reply on the initial request). Once
that happens, not all NLM-procedures are handled correctly.
Testing this is difficult and time-consuming. There still may be
problems with certain operations, but this definitely makes it behave
much better than before.
The problem occured due to a problem in EC, change-id I18a782903ba
addressed the root cause.
Change-Id: I23b385568e27232951fa3fbd7198a0e5d775a8c2
BUG: 1467313
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: https://review.gluster.org/17698
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
--- Additional comment from Worker Ant on 2017-07-09 11:13:47 CEST ---
COMMIT: https://review.gluster.org/17699 committed in master by Niels de Vos
(ndevos at redhat.com)
------
commit 01bfdd4d1759423681d311da33f4ac2346ace445
Author: Niels de Vos <ndevos at redhat.com>
Date: Mon Jul 3 16:24:53 2017 +0200
nfs/nlm: use refcounting for nfs3_call_state_t
In order to track down a potential use-after-free of the
nfs3_call_state_t structure in the NLM component, add reference counting
where teh structure is used. This should prevent premature free'ing of
the structure.
Change-Id: Ib1f13b0463ab1e012b7b49a623c91f0f3e73e1fb
BUG: 1467313
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: https://review.gluster.org/17699
Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
--- Additional comment from Worker Ant on 2017-07-09 11:14:07 CEST ---
COMMIT: https://review.gluster.org/17700 committed in master by Niels de Vos
(ndevos at redhat.com)
------
commit b81997264f079983fa02bd5fa2b3715224942b00
Author: Niels de Vos <ndevos at redhat.com>
Date: Tue Jul 4 20:11:11 2017 +0200
nfs/nlm: keep track of the call-state and frame for notifications
When blocking locks are used, a new frame is allocated that is used to
send the notification to the client once once the lock becomes
available. In all other cases, the frame that contains the request from
the client will be used for the reply.
Because there was no way to track the different clients with their
requests (captured in the call-state), the call-state could be free'd
before the notification was sent to the client. This caused a
use-after-free of the call-state and could trigger segfaults of the
Gluster/NFS server or incorrect replies on (un)lock requests.
By introducing a nlm4_notify_args structure, the call-state and frame
can be tracked better. This prevents the possibility of segfaulting when
the call-state is used after being free'd.
BUG: 1467313
Change-Id: I285d2bc552f509e5145653b7a50afcff827cd612
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: https://review.gluster.org/17700
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle at redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1467313
[Bug 1467313] cthon04 can cause segfault in gNFS/NLM
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=2hhilmaC42&a=cc_unsubscribe
More information about the Bugs
mailing list