[Bugs] [Bug 1657163] New: Stack overflow in readdirp with parallel-readdir enabled

Fri Dec 7 10:08:26 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1657163

            Bug ID: 1657163
           Summary: Stack overflow in readdirp with parallel-readdir
                    enabled
           Product: Red Hat Gluster Storage
           Version: 3.4
         Component: distribute
          Assignee: nbalacha at redhat.com
          Reporter: nbalacha at redhat.com
        QA Contact: tdesala at redhat.com
                CC: bugs at gluster.org, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1593199, 1593548

+++ This bug was initially created as a clone of Bug #1593548 +++

+++ This bug was initially created as a clone of Bug #1593199 +++

Description of problem:

Wind/unwind in readdirp causes the stack to grow if parallel-readdir is
enabled.

commit b9406e210717621bc672a63c1cbd1b0183834056 changed DHT to continue to wind
readdirp to its child xlator as long as there is space in the buffer.
DHT also strips out certain entries returned in the readdirp response from the
child xlator (linkto files, directories whose hashed subvol is not the child
that the call was wound to, etc).

If the buffer has just enough space left to hold a very few entries (1 or 2)
and the rda cache has lots of entries which dht would strip out, this can cause
the stack to grow at an alarming rate and eventually overflow, killing the
client process.

Assume that the buffer is almost full in dht_readdirp_cbk(for example, the
local->size is 4096 and local->filled is 3800)

1. dht will wind the readdirp call to its rda child xlator. 
2. rda sees that there is only enough space to return one entry, so it returns
one entry from its cache and unwinds to dht_readdirp_cbk. 
3. dht_readdirp_cbk processed the single entry returned which in this case is a
linkto file and skips it (count == 0). As the buffer is still not full , it
winds to the same rda xlator again. 
4. Rda, in its turn, returns one more entry (also a linkto file)from its cache
to  dht. 

This process (steps 3 and 4) continues with rda returning 1 linkto file entry
each time and dht winding to rda again. Eventually the stack overflows and the
process crashes.

Version-Release number of selected component (if applicable):

How reproducible:

Tried once

Steps to Reproduce:
I was able to reproduce the crash with a 2 brick distribute volume with
thousands of entries and thousands of linkto files on one of the bricks.
Fuse mount the volume and run 
   ls -l <mountpoint>

Actual results:

Client mount process crashes

Expected results:

Client should not crash

Additional info:

--- Additional comment from Nithya Balachandran on 2018-06-20 05:35:09 EDT ---

(gdb) bt
#0  0x00007f0f440b4029 in _gf_msg (domain=0x0, file=0x0, function=0x0, line=0,
level=GF_LOG_NONE, errnum=0, trace=0, msgid=0, 
    fmt=0x7f0f363a68c8 "stack-address: %p, winding from %s to %s") at
logging.c:2039
#1  0x00007f0f36359692 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f34275220, xdata=0x7f0f0c001960) at dht-common.c:5388
#2  0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c2915a0,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=295, off=20426,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#3  0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f34275510, xdata=0x7f0f0c001960) at dht-common.c:5388
#4  0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c291490,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=295, off=20420,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#5  0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f34275800, xdata=0x7f0f0c001960) at dht-common.c:5388
#6  0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c291380,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=295, off=20414,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#7  0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f34275af0, xdata=0x7f0f0c001960) at dht-common.c:5388

....

#665 0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f342b2160, xdata=0x7f0f0c001960) at dht-common.c:5388
#666 0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c27a0f0,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=295, off=18426,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#667 0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=1, op_errno=0, 
    orig_entries=0x7f0f342b2450, xdata=0x7f0f0c001960) at dht-common.c:5388
#668 0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c000a70,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=295, off=18420,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#669 0x00007f0f36359718 in dht_readdirp_cbk (frame=0x7f0f0c0016e0,
cookie=0x7f0f249dbaf0, this=0x7f0f24aa55c0, op_ret=22, op_errno=0, 
    orig_entries=0x7f0f342b2740, xdata=0x7f0f0c001960) at dht-common.c:5388
#670 0x00007f0f35cca8de in rda_readdirp (frame=0x7f0f0c0012d0,
this=0x7f0f249dbaf0, fd=0x7f0f247d3790, size=4096, off=18288,
xdata=0x7f0f0c001960)
    at readdir-ahead.c:266
#671 0x00007f0f3635af0f in dht_do_readdir (frame=0x7f0f0c0016e0,
this=0x7f0f24aa55c0, fd=0x7f0f247d3790, size=4096, yoff=18288, whichop=40, 
    dict=0x7f0f0c001960) at dht-common.c:5607
#672 0x00007f0f3635b639 in dht_readdirp (frame=0x7f0f0c0016e0,
this=0x7f0f24aa55c0, fd=0x7f0f247d3790, size=4096, yoff=18288,
dict=0x7f0f0c001960)
    at dht-common.c:5657
#673 0x00007f0f360f07a4 in wb_readdirp (frame=0x7f0f0c005b40,
this=0x7f0f241f0690, fd=0x7f0f247d3790, size=4096, off=18288,
xdata=0x7f0f0c001960)
    at write-behind.c:2514
#674 0x00007f0f4416dd38 in default_readdirp (frame=0x7f0f0c005b40,
this=0x7f0f249eb5b0, fd=0x7f0f247d3790, size=4096, off=18288,
xdata=0x7f0f0c001960)
    at defaults.c:2755
#675 0x00007f0f35abb497 in ioc_readdirp (frame=0x7f0f0c0042c0,
this=0x7f0f240566b0, fd=0x7f0f247d3790, size=4096, offset=18288,
dict=0x7f0f0c001960)
    at io-cache.c:1449
#676 0x00007f0f358a9de8 in qr_readdirp (frame=0x7f0f0c003690,
this=0x7f0f24057260, fd=0x7f0f247d3790, size=4096, offset=18288,
xdata=0x7f0f0c001960)
---Type <return> to continue, or q <return> to quit---
    at quick-read.c:532
#677 0x00007f0f4416dd38 in default_readdirp (frame=0x7f0f0c003690,
this=0x7f0f24a9cfd0, fd=0x7f0f247d3790, size=4096, off=18288,
xdata=0x7f0f0c001960)
    at defaults.c:2755
#678 0x00007f0f35492cfe in mdc_readdirp (frame=0x7f0f0c0041b0,
this=0x7f0f24a9db80, fd=0x7f0f247d3790, size=4096, offset=18288,
xdata=0x7f0f0c001960)
    at md-cache.c:2409
#679 0x00007f0f44168931 in default_readdirp_resume (frame=0x7f0f1c707670,
this=0x7f0f249e0dd0, fd=0x7f0f247d3790, size=4096, off=18288, xdata=0x0)
    at defaults.c:2019
#680 0x00007f0f440d153b in call_resume_wind (stub=0x7f0f1c03b370) at
call-stub.c:2163
#681 0x00007f0f440e0c49 in call_resume (stub=0x7f0f1c03b370) at
call-stub.c:2512
#682 0x00007f0f3527a039 in iot_worker (data=0x7f0f249dc6a0) at io-threads.c:224
#683 0x00007f0f42f00dc5 in start_thread () from /lib64/libpthread.so.0
#684 0x00007f0f4284573d in clone () from /lib64/libc.so.6
(gdb) 

Notice how only one entry is returned each time by rda.

--- Additional comment from Worker Ant on 2018-06-22 01:50:21 EDT ---

REVIEW: https://review.gluster.org/20359 (cluster/dht: Do not try to use up the
readdirp buffer) posted (#1) for review on master by N Balachandran

--- Additional comment from Nithya Balachandran on 2018-06-22 02:07:28 EDT ---

The patch only mitigates the problem by making it less likely to happen in most
customer setups. We can still hit this if a subvol contains thousands of linkto
files.

--- Additional comment from Nithya Balachandran on 2018-06-22 02:07:28 EDT ---

The patch only mitigates the problem by making it less likely to happen in most
customer setups. We can still hit this if a subvol contains thousands of linkto
files.

--- Additional comment from Worker Ant on 2018-06-29 07:11:19 EDT ---

COMMIT: https://review.gluster.org/20359 committed in master by "Raghavendra G"
<rgowdapp at redhat.com> with a commit message- cluster/dht: Do not try to use up
the readdirp buffer

DHT attempts to use up the entire buffer in readdirp before
unwinding in an attempt to reduce the number of calls.
However, this has 2 disadvantages:
1. This can cause a stack overflow when parallel readdir
is enabled. If the buffer only has a little space,rda can send back
only one or two entries. If those entries are stripped out by
dht_readdirp_cbk (linkto files for example) it will once again
wind down to rda in an attempt to fill the buffer before unwinding to FUSE.
This process can continue for several iterations, causing the stack
to grow and eventually overflow, causing the process to crash.
2. If parallel readdir is disabled, dht could send readdirp
calls with small buffers to the bricks, thus increasing the
number of network calls.

We are therefore reverting to the existing behaviour.
Please note, this only mitigates the stack overflow, it does
not prevent it from happening. This is still possible if
a subvol has thousands of linkto files for instance.

Change-Id: I291bc181c5249762d0c4fe27fa4fc2631166adf5
fixes: bz#1593548
Signed-off-by: N Balachandran <nbalacha at redhat.com>

--- Additional comment from Shyamsundar on 2018-10-23 11:12:07 EDT ---

This bug is getting closed because a release has been made available that
should address the reported issue. In case the problem is still not fixed with
glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for
several distributions should become available in the near future. Keep an eye
on the Gluster Users mailinglist [2] and the update infrastructure for your
distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1593199
[Bug 1593199] Stack overflow in readdirp with parallel-readdir enabled
https://bugzilla.redhat.com/show_bug.cgi?id=1593548
[Bug 1593548] Stack overflow in readdirp with parallel-readdir enabled
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=O6LMF0YOO2&a=cc_unsubscribe