[Gluster-devel] glusterfsd crash when using quota without io-threads

Sanoj Unnikrishnan sunnikri at redhat.com
Fri Jun 9 07:48:09 UTC 2017


I have not used valgrind before, so I may be wrong here.

I think the valgrind_stack_deregister should have been after GF_FREE_STACK.
That may explain the instance of invalid_write during stack_destroy calls
in after.log.

There seems to be numerous issues reported in before.log (I am assuming,
you did not have the valgrind_stack_register call in it),
>From http://valgrind.org/docs/manual/manual-core.html, looks like valgrind
detects client switching stack only If a memory of > 2MB change in Stack
pointer register. Is it possible that since marker is only using 16k, the
stack pointer could have been in less than 2MB offset from current Stack
Pointer?
It seems unlikely to me since we are allocating the stack from heap.

Did you try a run with the valgrind instrumentation, without changing stack
size ?
None of this explains the crash though.. We had seen a memory overrun crash
in same code path on netbsd earlier but did not follow up then.
Will look further into it.



On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee <kinglongmee at gmail.com> wrote:

> Maybe it's my fault, I found valgrind can't parse context
> switch(makecontext/swapcontext) by default.
> So, I test with the following patch (tells valgrind new stack by
> VALGRIND_STACK_DEREGISTER).
> With it, only some "Invalid read/write" by __gf_mem_invalidate, Does it
> right ??
> So, there is only one problem, if without io-threads, the stack size is
> small for marker.
> Am I right?
>
> Ps:
> valgrind-before.log is the log without the following patch, the
> valgrind-after.log is with the patch.
>
> ==35656== Invalid write of size 8
> ==35656==    at 0x4E8FFD4: __gf_mem_invalidate (mem-pool.c:278)
> ==35656==    by 0x4E90313: __gf_free (mem-pool.c:334)
> ==35656==    by 0x4EA4E5B: synctask_destroy (syncop.c:394)
> ==35656==    by 0x4EA4EDF: synctask_done (syncop.c:412)
> ==35656==    by 0x4EA58B3: synctask_switchto (syncop.c:673)
> ==35656==    by 0x4EA596B: syncenv_processor (syncop.c:704)
> ==35656==    by 0x60B2DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
> ==35656==    by 0x67A873C: clone (in /usr/lib64/libc-2.17.so)
> ==35656==  Address 0x1b104931 is 2,068,017 bytes inside a block of size
> 2,097,224 alloc'd
> ==35656==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
> ==35656==    by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
> ==35656==    by 0x4EA52F5: synctask_create (syncop.c:500)
> ==35656==    by 0x4EA55AE: synctask_new1 (syncop.c:576)
> ==35656==    by 0x143AE0D7: mq_synctask1 (marker-quota.c:1078)
> ==35656==    by 0x143AE199: mq_synctask (marker-quota.c:1097)
> ==35656==    by 0x143AE6F6: _mq_create_xattrs_txn (marker-quota.c:1236)
> ==35656==    by 0x143AE82D: mq_create_xattrs_txn (marker-quota.c:1253)
> ==35656==    by 0x143B0DCB: mq_inspect_directory_xattr
> (marker-quota.c:2027)
> ==35656==    by 0x143B13A8: mq_xattr_state (marker-quota.c:2117)
> ==35656==    by 0x143A6E80: marker_lookup_cbk (marker.c:2961)
> ==35656==    by 0x141811E0: up_lookup_cbk (upcall.c:753)
>
> ----------------------- valgrind ------------------------------
> ------------
>
> Don't forget install valgrind-devel.
>
> diff --git a/libglusterfs/src/syncop.c b/libglusterfs/src/syncop.c
> index 00a9b57..97b1de1 100644
> --- a/libglusterfs/src/syncop.c
> +++ b/libglusterfs/src/syncop.c
> @@ -10,6 +10,7 @@
>
>  #include "syncop.h"
>  #include "libglusterfs-messages.h"
> +#include <valgrind/valgrind.h>
>
>  int
>  syncopctx_setfsuid (void *uid)
> @@ -388,6 +389,8 @@ synctask_destroy (struct synctask *task)
>          if (!task)
>                  return;
>
> +VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
> +
>          GF_FREE (task->stack);
>
>          if (task->opframe)
> @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, size_t
> stacksize, sync
>
>          newtask->ctx.uc_stack.ss_sp   = newtask->stack;
>
> +       newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack,
> newtask-
> +
>          makecontext (&newtask->ctx, (void (*)(void)) synctask_wrap, 2,
> newtask)
>
>          newtask->state = SYNCTASK_INIT;
> diff --git a/libglusterfs/src/syncop.h b/libglusterfs/src/syncop.h
> index c2387e6..247325b 100644
> --- a/libglusterfs/src/syncop.h
> +++ b/libglusterfs/src/syncop.h
> @@ -63,6 +63,7 @@ struct synctask {
>          int                 woken;
>          int                 slept;
>          int                 ret;
> +       int                 valgrind_ret;
>
>          uid_t               uid;
>          gid_t               gid;
> diff --git a/xlators/features/marker/src/marker-quota.c
> b/xlators/features/marke
> index 902b8e5..f3d2507 100644
> --- a/xlators/features/marker/src/marker-quota.c
> +++ b/xlators/features/marker/src/marker-quota.c
> @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t task,
> gf_boole
>          }
>
>          if (spawn) {
> -                ret = synctask_new1 (this->ctx->env, 1024 * 16, task,
> +                ret = synctask_new1 (this->ctx->env, 0, task,
>                                        mq_synctask_cleanup, NULL, args);
>                  if (ret) {
>                          gf_log (this->name, GF_LOG_ERROR, "Failed to
> spawn "
>
>
> On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
> > I would still be worried about the Invalid read/write. IMO whether an
> illegal access causes a crash depends on whether the page is currently
> mapped.
> > So, it could so happen that there is a use after free / use outside of
> bounds happening in the code  and  it turns out that this location gets
> mapped in a different (unmapped) page when IO threads is not loaded.
> >
> > Could you please share the valgrind logs as well.
> >
> > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee <kinglongmee at gmail.com
> <mailto:kinglongmee at gmail.com>> wrote:
> >
> >     After deleting io-threads from the vols, quota operates
> (list/set/modify) lets glusterfsd crash.
> >     I use it at CentOS 7 (CentOS Linux release 7.3.1611) with glusterfs
> 3.8.12.
> >     It seems the stack corrupt, when testing with the following diff,
> glusterfsd runs correctly.
> >
> >     There are two questions as,
> >     1. When using valgrind, it shows there are many "Invalid read/write"
> when with io-threads.
> >        Why glusterfsd runs correctly with io-threads? but crash without
> io-threads?
> >
> >     2. With the following diff, valgrind also shows many "Invalid
> read/write" when without io-threads?
> >        but no any crash.
> >
> >     Any comments are welcome.
> >
> >     Revert http://review.gluster.org/11499 <http://review.gluster.org/
> 11499> seems better than the diff.
> >
> >     diff --git a/xlators/features/marker/src/marker-quota.c
> b/xlators/features/marke
> >     index 902b8e5..f3d2507 100644
> >     --- a/xlators/features/marker/src/marker-quota.c
> >     +++ b/xlators/features/marker/src/marker-quota.c
> >     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t
> task, gf_boole
> >              }
> >
> >              if (spawn) {
> >     -                ret = synctask_new1 (this->ctx->env, 1024 * 16,
> task,
> >     +                ret = synctask_new1 (this->ctx->env, 0, task,
> >                                            mq_synctask_cleanup, NULL,
> args);
> >                      if (ret) {
> >                              gf_log (this->name, GF_LOG_ERROR, "Failed
> to spawn "
> >
> >     -----------------------------------test steps
> ----------------------------------
> >     1. gluster volume create gvtest node1:/test/ node2:/test/
> >     2. gluster volume start gvtest
> >     3. gluster volume quota enable gvtest
> >
> >     4. "deletes io-threads from all vols"
> >     5. reboot node1 and node2.
> >     6. sh quota-set.sh
> >
> >     # cat quota-set.sh
> >     gluster volume quota gvtest list
> >     gluster volume quota gvtest limit-usage / 10GB
> >     gluster volume quota gvtest limit-usage /1234 1GB
> >     gluster volume quota gvtest limit-usage /hello 1GB
> >     gluster volume quota gvtest limit-usage /test 1GB
> >     gluster volume quota gvtest limit-usage /xyz 1GB
> >     gluster volume quota gvtest list
> >     gluster volume quota gvtest remove /hello
> >     gluster volume quota gvtest remove /test
> >     gluster volume quota gvtest list
> >     gluster volume quota gvtest limit-usage /test 1GB
> >     gluster volume quota gvtest remove /xyz
> >     gluster volume quota gvtest list
> >
> >     -----------------------glusterfsd crash without the
> diff--------------------------------
> >
> >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+
> 0xf5)[0x7f6e1e950af1]
> >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[
> 0x7f6e1e956943]
> >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
> >     /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
> >     /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
> >     /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
> >     /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
> >     /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
> >     /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
> >     /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[0x7f6e1e981327]
> >     /usr/local/lib/libglusterfs.so.0(synctask_destroy+0x82)[
> 0x7f6e1e995c20]
> >     /usr/local/lib/libglusterfs.so.0(synctask_done+0x25)[0x7f6e1e995c47]
> >     /usr/local/lib/libglusterfs.so.0(synctask_switchto+0xcf)[
> 0x7f6e1e996585]
> >     /usr/local/lib/libglusterfs.so.0(syncenv_processor+0x60)[
> 0x7f6e1e99663d]
> >     /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
> >     /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
> >
> >     or
> >
> >     package-string: glusterfs 3.8.12
> >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+
> 0xf5)[0x7fa15e623af1]
> >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[
> 0x7fa15e629943]
> >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x409c83]
> >     /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
> >     /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
> >     /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
> >     /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
> >     /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
> >     /lib64/libc.so.6(__libc_calloc+0xb4)[0x7fa15cd43a14]
> >     /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[0x7fa15e653a5f]
> >     /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[0x7fa15e65875a]
> >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+
> 0xa98c)[0x7fa153a8398c]
> >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+
> 0xacbc)[0x7fa153a83cbc]
> >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+
> 0xad10)[0x7fa153a83d10]
> >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+
> 0xb2a7)[0x7fa153a842a7]
> >     /usr/local/lib/libglusterfs.so.0(+0x97ea9)[0x7fa15e68eea9]
> >     /usr/local/lib/libglusterfs.so.0(+0x982c6)[0x7fa15e68f2c6]
> >     /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
> >     /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
> >
> >     _______________________________________________
> >     Gluster-devel mailing list
> >     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >     http://lists.gluster.org/mailman/listinfo/gluster-devel <
> http://lists.gluster.org/mailman/listinfo/gluster-devel>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170609/37abd7e9/attachment-0001.html>


More information about the Gluster-devel mailing list