[Gluster-devel] glusterfsd crash when using quota without io-threads
Sanoj Unnikrishnan
sunnikri at redhat.com
Thu Jun 29 11:49:01 UTC 2017
@ Raghavendra. The io threads stack wind/ unwind may be the reason
This is what the stack looked like when it hit the mprotect guarded page. I
am not sure why there are so few frames..
but the register values did match to the guarded address. so we know stack
was being overused.
(gdb) where
#0 0x00007f8a92c51204 in _dl_lookup_symbol_x () from
/lib64/ld-linux-x86-64.so.2
#1 0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2 0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from
/lib64/ld-linux-x86-64.so.2
#3 0x0000000000000000 in ?? ()
If we try without it overflows the syncop stack(which was allocated with
callloc) and goes to some other address allocated though calloc and results
in a segfault elsewhere.
@Kinglong,
Yes, increasing stack size to 256 k was enough to alleviate the issue in my
tests.
Regards,
Sanoj
On Wed, Jun 28, 2017 at 6:10 PM, Kinglong Mee <kinglongmee at gmail.com> wrote:
> Hi Sanoj,
>
> Do you testing with changing the 16K to 2M (default stack size)?
>
> If also overflowing, there must be an important issue exist;
> not overflowing, only the issue of 16K is small for marker without
> io-threads?
>
> thanks,
> Kinglong Mee
>
> On 6/28/2017 19:48, Sanoj Unnikrishnan wrote:
> > Hi Kinglong,
> >
> > I opened https://bugzilla.redhat.com/show_bug.cgi?id=1465861 to track
> this. It seems to be an important issue that went undetected.
> > Thanks for catching this!!
> >
> > In my attempts to repro this , I found that on each run some random
> structures where getting corrupted and running into segfault.
> > In order to assert that the stack was indeed growing into all the
> allocated space and beyond, I set a guard page in the end of the allocated
> stack space (so that we hit a segfault before overusing the space).
> > Below are the code changes.
> >
> > @@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t
> stacksize, synctask_fn_t fn,
> > struct synctask *newtask = NULL;
> > xlator_t *this = THIS;
> > int destroymode = 0;
> > + int r=0;
> > + char *v;
> >
> > VALIDATE_OR_GOTO (env, err);
> > VALIDATE_OR_GOTO (fn, err);
> > @@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t
> stacksize, synctask_fn_t fn,
> > gf_common_mt_syncstack);
> > newtask->ctx.uc_stack.ss_size = env->stacksize;
> > } else {
> > - newtask->stack = GF_CALLOC (1, stacksize,
> > + newtask->stack = GF_CALLOC (1, stacksize,
> > gf_common_mt_syncstack);
> > newtask->ctx.uc_stack.ss_size = stacksize;
> > + if (stacksize == 16*1024) {
> > + v = (unsigned long)((char *)(newtask->stack) +
> 4095) & (~4095);
> > + r = mprotect(v, 4096, PROT_NONE);
> > + gf_msg ("syncop", GF_LOG_ERROR, errno,
> > + LG_MSG_GETCONTEXT_FAILED, "SKU: using
> 16k stack starting at %p, mprotect returned %d, guard page: %p",
> newtask->stack, r, v);
> > + }
> > }
> >
> > (gdb) where
> > #0 0x00007f8a92c51204 in _dl_lookup_symbol_x () from
> /lib64/ld-linux-x86-64.so.2
> > #1 0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
> > #2 0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from
> /lib64/ld-linux-x86-64.so.2
> > #3 0x0000000000000000 in ?? ()
> >
> >
> > (gdb) info reg
> >
> > rdi 0x7f8a92946188 140233141412232
> > rbp 0x7f8a800b4000 0x7f8a800b4000
> > rsp 0x7f8a800b4000 0x7f8a800b4000
> > r8 0x7f8a92e4ba60 140233146677856
> >
> > (gdb) layout asm
> >
> > >│0x7f8a92c51204 <_dl_lookup_symbol_x+4> push %r15
> <== push on stack at the guarded page caused segfault
> >
> > From the brick log we have,
> >
> > [syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting
> at 0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data
> available]
> >
> > Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0 and the page
> 0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault
> hit as seen in gdb.
> >
> > This confirms that the stack space is not sufficient and overflowing,
> > I am not sure why we don't hit this in the presence of IO threads
> though, It may just be that with io threads in graph, we may have some
> allocated and unused memory which our stack freely grows into. It may just
> be a silent undetected reuse of some memory.
> >
> > Regards,
> > Sanoj
> >
> >
> >
> > On Tue, Jun 13, 2017 at 10:49 AM, Sanoj Unnikrishnan <
> sunnikri at redhat.com <mailto:sunnikri at redhat.com>> wrote:
> >
> > Hi Kinglong
> > I was reading about makecontext/swapcontext as well,
> > I did find an article that suggested to use mprotect and force a
> segfault to check if we have a stack space issue here.
> > here is the link. http://www.evanjones.ca/software/threading.html <
> http://www.evanjones.ca/software/threading.html>.
> >
> > I don't think i can try this until tomorrow.
> > thanks and regards,
> > Sanoj
> >
> > On Tue, Jun 13, 2017 at 5:58 AM, Kinglong Mee <kinglongmee at gmail.com
> <mailto:kinglongmee at gmail.com>> wrote:
> >
> > Hi Sanoj,
> >
> > What's your opinion about this problem?
> >
> > thanks,
> > Kinglong Mee
> >
> > On 6/9/2017 17:20, Kinglong Mee wrote:
> > > Hi Sanoj,
> > >
> > > On 6/9/2017 15:48, Sanoj Unnikrishnan wrote:
> > >> I have not used valgrind before, so I may be wrong here.
> > >>
> > >> I think the valgrind_stack_deregister should have been after
> GF_FREE_STACK.
> > >> That may explain the instance of invalid_write during
> stack_destroy calls in after.log.
> > >
> > > No. I move it, but the instance of invalid_write also exist.
> > >
> > >>
> > >> There seems to be numerous issues reported in before.log (I
> am assuming, you did not have the valgrind_stack_register call in it),
> > >
> > > Yes, the before.log is test without any code change(but
> without io-threads).
> > >
> > >> From http://valgrind.org/docs/manual/manual-core.html <
> http://valgrind.org/docs/manual/manual-core.html> <
> http://valgrind.org/docs/manual/manual-core.html <
> http://valgrind.org/docs/manual/manual-core.html>>, looks like valgrind
> detects client switching stack only If a memory of > 2MB change in Stack
> pointer register.
> > >
> > > I test with a larger max-stackframe as,
> > > valgrind --leak-check=full --max-stackframe=242293216
> > >
> > >> Is it possible that since marker is only using 16k, the stack
> pointer could have been in less than 2MB offset from current Stack Pointer?
> > >
> > > Maybe.
> > > But with io-threads (with adding valgrind_stack_deregister),
> the valgrind only show some
> > > "Invalid read/write" about __gf_mem_invalidate.
> > > The only reason here I think is the stack size (16K) of marker
> using.
> > >
> > > I have not used makecontext/swapcontext before, am i right?
> > > 1. without swapconext, the stack maybe (just an example)
> > > --> io_stats-> quota-> marker-> io-threads ->.....
> > >
> > > 2. with swapcontext,
> > > --> io_stats-> quota-> marker
> > > swithto new stack -> io-threads
> > >
> > > After switchto new stack, the new stack size is 16K,
> > > Does it enough without io-threads?
> > >
> > > I don't know what's the behave of io-threads, does it call all
> sub-xlator using the 16K ? or others?
> > >
> > >> It seems unlikely to me since we are allocating the stack
> from heap.
> > >>
> > >> Did you try a run with the valgrind instrumentation, without
> changing stack size ?
> > >
> > > OK.
> > > The following valgrind-without-stack-change.log is test with
> adding valgrind_stack_deregister
> > > (and without io-threads).
> > >
> > > thanks,
> > > Kinglong Mee
> > >
> > >> None of this explains the crash though.. We had seen a memory
> overrun crash in same code path on netbsd earlier but did not follow up
> then.
> > >> Will look further into it.
> > >>
> > >>
> > >>
> > >> On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee <
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com> <mailto:
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com>>> wrote:
> > >>
> > >> Maybe it's my fault, I found valgrind can't parse context
> switch(makecontext/swapcontext) by default.
> > >> So, I test with the following patch (tells valgrind new
> stack by VALGRIND_STACK_DEREGISTER).
> > >> With it, only some "Invalid read/write" by
> __gf_mem_invalidate, Does it right ??
> > >> So, there is only one problem, if without io-threads, the
> stack size is small for marker.
> > >> Am I right?
> > >>
> > >> Ps:
> > >> valgrind-before.log is the log without the following
> patch, the valgrind-after.log is with the patch.
> > >>
> > >> ==35656== Invalid write of size 8
> > >> ==35656== at 0x4E8FFD4: __gf_mem_invalidate
> (mem-pool.c:278)
> > >> ==35656== by 0x4E90313: __gf_free (mem-pool.c:334)
> > >> ==35656== by 0x4EA4E5B: synctask_destroy (syncop.c:394)
> > >> ==35656== by 0x4EA4EDF: synctask_done (syncop.c:412)
> > >> ==35656== by 0x4EA58B3: synctask_switchto
> (syncop.c:673)
> > >> ==35656== by 0x4EA596B: syncenv_processor
> (syncop.c:704)
> > >> ==35656== by 0x60B2DC4: start_thread (in /usr/lib64/
> libpthread-2.17.so <http://libpthread-2.17.so> <http://libpthread-2.17.so
> >)
> > >> ==35656== by 0x67A873C: clone (in /usr/lib64/
> libc-2.17.so <http://libc-2.17.so> <http://libc-2.17.so>)
> > >> ==35656== Address 0x1b104931 is 2,068,017 bytes inside a
> block of size 2,097,224 alloc'd
> > >> ==35656== at 0x4C29975: calloc
> (vg_replace_malloc.c:711)
> > >> ==35656== by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
> > >> ==35656== by 0x4EA52F5: synctask_create (syncop.c:500)
> > >> ==35656== by 0x4EA55AE: synctask_new1 (syncop.c:576)
> > >> ==35656== by 0x143AE0D7: mq_synctask1
> (marker-quota.c:1078)
> > >> ==35656== by 0x143AE199: mq_synctask
> (marker-quota.c:1097)
> > >> ==35656== by 0x143AE6F6: _mq_create_xattrs_txn
> (marker-quota.c:1236)
> > >> ==35656== by 0x143AE82D: mq_create_xattrs_txn
> (marker-quota.c:1253)
> > >> ==35656== by 0x143B0DCB: mq_inspect_directory_xattr
> (marker-quota.c:2027)
> > >> ==35656== by 0x143B13A8: mq_xattr_state
> (marker-quota.c:2117)
> > >> ==35656== by 0x143A6E80: marker_lookup_cbk
> (marker.c:2961)
> > >> ==35656== by 0x141811E0: up_lookup_cbk (upcall.c:753)
> > >>
> > >> ----------------------- valgrind
> ------------------------------------------
> > >>
> > >> Don't forget install valgrind-devel.
> > >>
> > >> diff --git a/libglusterfs/src/syncop.c
> b/libglusterfs/src/syncop.c
> > >> index 00a9b57..97b1de1 100644
> > >> --- a/libglusterfs/src/syncop.c
> > >> +++ b/libglusterfs/src/syncop.c
> > >> @@ -10,6 +10,7 @@
> > >>
> > >> #include "syncop.h"
> > >> #include "libglusterfs-messages.h"
> > >> +#include <valgrind/valgrind.h>
> > >>
> > >> int
> > >> syncopctx_setfsuid (void *uid)
> > >> @@ -388,6 +389,8 @@ synctask_destroy (struct synctask
> *task)
> > >> if (!task)
> > >> return;
> > >>
> > >> +VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
> > >> +
> > >> GF_FREE (task->stack);
> > >>
> > >> if (task->opframe)
> > >> @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env,
> size_t stacksize, sync
> > >>
> > >> newtask->ctx.uc_stack.ss_sp = newtask->stack;
> > >>
> > >> + newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack,
> newtask-
> > >> +
> > >> makecontext (&newtask->ctx, (void (*)(void))
> synctask_wrap, 2, newtask)
> > >>
> > >> newtask->state = SYNCTASK_INIT;
> > >> diff --git a/libglusterfs/src/syncop.h
> b/libglusterfs/src/syncop.h
> > >> index c2387e6..247325b 100644
> > >> --- a/libglusterfs/src/syncop.h
> > >> +++ b/libglusterfs/src/syncop.h
> > >> @@ -63,6 +63,7 @@ struct synctask {
> > >> int woken;
> > >> int slept;
> > >> int ret;
> > >> + int valgrind_ret;
> > >>
> > >> uid_t uid;
> > >> gid_t gid;
> > >> diff --git a/xlators/features/marker/src/marker-quota.c
> b/xlators/features/marke
> > >> index 902b8e5..f3d2507 100644
> > >> --- a/xlators/features/marker/src/marker-quota.c
> > >> +++ b/xlators/features/marker/src/marker-quota.c
> > >> @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this,
> synctask_fn_t task, gf_boole
> > >> }
> > >>
> > >> if (spawn) {
> > >> - ret = synctask_new1 (this->ctx->env,
> 1024 * 16, task,
> > >> + ret = synctask_new1 (this->ctx->env, 0,
> task,
> > >>
> mq_synctask_cleanup, NULL, args);
> > >> if (ret) {
> > >> gf_log (this->name,
> GF_LOG_ERROR, "Failed to spawn "
> > >>
> > >>
> > >> On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
> > >> > I would still be worried about the Invalid read/write.
> IMO whether an illegal access causes a crash depends on whether the page is
> currently mapped.
> > >> > So, it could so happen that there is a use after free /
> use outside of bounds happening in the code and it turns out that this
> location gets mapped in a different (unmapped) page when IO threads is not
> loaded.
> > >> >
> > >> > Could you please share the valgrind logs as well.
> > >> >
> > >> > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee <
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com> <mailto:
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com>> <mailto:
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com> <mailto:
> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com>>>> wrote:
> > >> >
> > >> > After deleting io-threads from the vols, quota
> operates (list/set/modify) lets glusterfsd crash.
> > >> > I use it at CentOS 7 (CentOS Linux release
> 7.3.1611) with glusterfs 3.8.12.
> > >> > It seems the stack corrupt, when testing with the
> following diff, glusterfsd runs correctly.
> > >> >
> > >> > There are two questions as,
> > >> > 1. When using valgrind, it shows there are many
> "Invalid read/write" when with io-threads.
> > >> > Why glusterfsd runs correctly with io-threads?
> but crash without io-threads?
> > >> >
> > >> > 2. With the following diff, valgrind also shows
> many "Invalid read/write" when without io-threads?
> > >> > but no any crash.
> > >> >
> > >> > Any comments are welcome.
> > >> >
> > >> > Revert http://review.gluster.org/11499 <
> http://review.gluster.org/11499> <http://review.gluster.org/11499 <
> http://review.gluster.org/11499>> <http://review.gluster.org/11499 <
> http://review.gluster.org/11499> <http://review.gluster.org/11499 <
> http://review.gluster.org/11499>>> seems better than the diff.
> > >> >
> > >> > diff --git a/xlators/features/marker/src/marker-quota.c
> b/xlators/features/marke
> > >> > index 902b8e5..f3d2507 100644
> > >> > --- a/xlators/features/marker/src/marker-quota.c
> > >> > +++ b/xlators/features/marker/src/marker-quota.c
> > >> > @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this,
> synctask_fn_t task, gf_boole
> > >> > }
> > >> >
> > >> > if (spawn) {
> > >> > - ret = synctask_new1
> (this->ctx->env, 1024 * 16, task,
> > >> > + ret = synctask_new1
> (this->ctx->env, 0, task,
> > >> >
> mq_synctask_cleanup, NULL, args);
> > >> > if (ret) {
> > >> > gf_log (this->name,
> GF_LOG_ERROR, "Failed to spawn "
> > >> >
> > >> > -----------------------------------test steps
> ----------------------------------
> > >> > 1. gluster volume create gvtest node1:/test/
> node2:/test/
> > >> > 2. gluster volume start gvtest
> > >> > 3. gluster volume quota enable gvtest
> > >> >
> > >> > 4. "deletes io-threads from all vols"
> > >> > 5. reboot node1 and node2.
> > >> > 6. sh quota-set.sh
> > >> >
> > >> > # cat quota-set.sh
> > >> > gluster volume quota gvtest list
> > >> > gluster volume quota gvtest limit-usage / 10GB
> > >> > gluster volume quota gvtest limit-usage /1234 1GB
> > >> > gluster volume quota gvtest limit-usage /hello 1GB
> > >> > gluster volume quota gvtest limit-usage /test 1GB
> > >> > gluster volume quota gvtest limit-usage /xyz 1GB
> > >> > gluster volume quota gvtest list
> > >> > gluster volume quota gvtest remove /hello
> > >> > gluster volume quota gvtest remove /test
> > >> > gluster volume quota gvtest list
> > >> > gluster volume quota gvtest limit-usage /test 1GB
> > >> > gluster volume quota gvtest remove /xyz
> > >> > gluster volume quota gvtest list
> > >> >
> > >> > -----------------------glusterfsd crash without
> the diff--------------------------------
> > >> >
> > >> > /usr/local/lib/libglusterfs.
> so.0(_gf_msg_backtrace_nomem+0xf5)[0x7f6e1e950af1]
> > >> > /usr/local/lib/libglusterfs.
> so.0(gf_print_trace+0x21f)[0x7f6e1e956943]
> > >> > /usr/local/sbin/glusterfsd(
> glusterfsd_print_trace+0x1f)[0x409c83]
> > >> > /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
> > >> > /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
> > >> > /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
> > >> > /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
> > >> > /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
> > >> > /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
> > >> > /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[
> 0x7f6e1e981327]
> > >> > /usr/local/lib/libglusterfs.
> so.0(synctask_destroy+0x82)[0x7f6e1e995c20]
> > >> > /usr/local/lib/libglusterfs.
> so.0(synctask_done+0x25)[0x7f6e1e995c47]
> > >> > /usr/local/lib/libglusterfs.
> so.0(synctask_switchto+0xcf)[0x7f6e1e996585]
> > >> > /usr/local/lib/libglusterfs.
> so.0(syncenv_processor+0x60)[0x7f6e1e99663d]
> > >> > /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
> > >> > /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
> > >> >
> > >> > or
> > >> >
> > >> > package-string: glusterfs 3.8.12
> > >> > /usr/local/lib/libglusterfs.
> so.0(_gf_msg_backtrace_nomem+0xf5)[0x7fa15e623af1]
> > >> > /usr/local/lib/libglusterfs.
> so.0(gf_print_trace+0x21f)[0x7fa15e629943]
> > >> > /usr/local/sbin/glusterfsd(
> glusterfsd_print_trace+0x1f)[0x409c83]
> > >> > /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
> > >> > /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
> > >> > /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
> > >> > /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
> > >> > /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
> > >> > /lib64/libc.so.6(__libc_
> calloc+0xb4)[0x7fa15cd43a14]
> > >> > /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[
> 0x7fa15e653a5f]
> > >> > /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[
> 0x7fa15e65875a]
> > >> > /usr/local/lib/glusterfs/3.8.
> 12/rpc-transport/socket.so(+0xa98c)[0x7fa153a8398c]
> > >> > /usr/local/lib/glusterfs/3.8.
> 12/rpc-transport/socket.so(+0xacbc)[0x7fa153a83cbc]
> > >> > /usr/local/lib/glusterfs/3.8.
> 12/rpc-transport/socket.so(+0xad10)[0x7fa153a83d10]
> > >> > /usr/local/lib/glusterfs/3.8.
> 12/rpc-transport/socket.so(+0xb2a7)[0x7fa153a842a7]
> > >> > /usr/local/lib/libglusterfs.
> so.0(+0x97ea9)[0x7fa15e68eea9]
> > >> > /usr/local/lib/libglusterfs.
> so.0(+0x982c6)[0x7fa15e68f2c6]
> > >> > /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
> > >> > /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
> > >> >
> > >> > _______________________________________________
> > >> > Gluster-devel mailing list
> > >> > Gluster-devel at gluster.org <mailto:
> Gluster-devel at gluster.org> <mailto:Gluster-devel at gluster.org <mailto:
> Gluster-devel at gluster.org>> <mailto:Gluster-devel at gluster.org <mailto:
> Gluster-devel at gluster.org> <mailto:Gluster-devel at gluster.org <mailto:
> Gluster-devel at gluster.org>>>
> > >> > http://lists.gluster.org/
> mailman/listinfo/gluster-devel <http://lists.gluster.org/
> mailman/listinfo/gluster-devel> <http://lists.gluster.org/
> mailman/listinfo/gluster-devel <http://lists.gluster.org/
> mailman/listinfo/gluster-devel>> <http://lists.gluster.org/
> mailman/listinfo/gluster-devel <http://lists.gluster.org/
> mailman/listinfo/gluster-devel> <http://lists.gluster.org/
> mailman/listinfo/gluster-devel <http://lists.gluster.org/
> mailman/listinfo/gluster-devel>>>
> > >> >
> > >> >
> > >>
> > >>
> > >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170629/f474e4ee/attachment-0001.html>
More information about the Gluster-devel
mailing list