[Gluster-devel] glusterfsd crash when using quota without io-threads

Wed Jun 28 11:48:56 UTC 2017

Hi Kinglong,

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1465861 to track this.
It seems to be an important issue that went undetected.
Thanks for catching this!!

In my attempts to repro this , I found that on each run some random
structures where getting corrupted and running into segfault.
In order to assert that the stack was indeed growing into all the allocated
space and beyond, I set a guard page in the end of the allocated stack
space (so that we hit a segfault before overusing the space).
Below are the code changes.

@@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t stacksize,
synctask_fn_t fn,
         struct synctask *newtask = NULL;
         xlator_t        *this    = THIS;
         int             destroymode = 0;
+        int                     r=0;
+        char                    *v;

         VALIDATE_OR_GOTO (env, err);
         VALIDATE_OR_GOTO (fn, err);
@@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t
stacksize, synctask_fn_t fn,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = env->stacksize;
         } else {
-                newtask->stack = GF_CALLOC (1, stacksize,
+               newtask->stack = GF_CALLOC (1, stacksize,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = stacksize;
+                if (stacksize == 16*1024) {
+                        v = (unsigned long)((char *)(newtask->stack) +
4095) & (~4095);
+                        r = mprotect(v, 4096, PROT_NONE);
+                       gf_msg ("syncop", GF_LOG_ERROR, errno,
+                                LG_MSG_GETCONTEXT_FAILED, "SKU: using 16k
stack starting at %p, mprotect returned %d, guard page: %p",
newtask->stack, r, v);
+               }
         }

(gdb) where
#0  0x00007f8a92c51204 in _dl_lookup_symbol_x () from
/lib64/ld-linux-x86-64.so.2
#1  0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from
/lib64/ld-linux-x86-64.so.2
#3  0x0000000000000000 in ?? ()


(gdb) info reg

rdi            0x7f8a92946188    140233141412232
rbp            0x7f8a800b4000    0x7f8a800b4000
rsp            0x7f8a800b4000    0x7f8a800b4000
r8             0x7f8a92e4ba60    140233146677856

(gdb) layout asm

  >│0x7f8a92c51204 <_dl_lookup_symbol_x+4>          push
%r15                   <== push on stack at the guarded page caused segfault

>From the brick log we have,

[syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting at
0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data
available]

Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0  and the page
0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault
hit as seen in gdb.

This confirms that the stack space is not sufficient and overflowing,
I am not sure why we don't hit this in the presence of IO threads though,
It may just be that with io threads in graph, we may have some allocated
and unused memory which our stack freely grows into. It may just be a
silent undetected reuse of some memory.

Regards,
Sanoj



On Tue, Jun 13, 2017 at 10:49 AM, Sanoj Unnikrishnan <sunnikri at redhat.com>
wrote:

> Hi Kinglong
> I was reading about makecontext/swapcontext as well,
> I did find an article that suggested to use mprotect and force a segfault
> to check if we have a stack space issue here.
> here is the link. http://www.evanjones.ca/software/threading.html.
>
> I don't think i can try this until tomorrow.
> thanks and regards,
> Sanoj
>
> On Tue, Jun 13, 2017 at 5:58 AM, Kinglong Mee <kinglongmee at gmail.com>
> wrote:
>
>> Hi Sanoj,
>>
>> What's your opinion about this problem?
>>
>> thanks,
>> Kinglong Mee
>>
>> On 6/9/2017 17:20, Kinglong Mee wrote:
>> > Hi Sanoj,
>> >
>> > On 6/9/2017 15:48, Sanoj Unnikrishnan wrote:
>> >> I have not used valgrind before, so I may be wrong here.
>> >>
>> >> I think the valgrind_stack_deregister should have been after
>> GF_FREE_STACK.
>> >> That may explain the instance of invalid_write during stack_destroy
>> calls in after.log.
>> >
>> > No. I move it, but the instance of invalid_write also exist.
>> >
>> >>
>> >> There seems to be numerous issues reported in before.log (I am
>> assuming, you did not have the valgrind_stack_register call in it),
>> >
>> > Yes, the before.log is test without any code change(but without
>> io-threads).
>> >
>> >> From http://valgrind.org/docs/manual/manual-core.html <
>> http://valgrind.org/docs/manual/manual-core.html>, looks like valgrind
>> detects client switching stack only If a memory of > 2MB change in Stack
>> pointer register.
>> >
>> > I test with a larger max-stackframe as,
>> > valgrind  --leak-check=full --max-stackframe=242293216
>> >
>> >> Is it possible that since marker is only using 16k, the stack pointer
>> could have been in less than 2MB offset from current Stack Pointer?
>> >
>> > Maybe.
>> > But with io-threads (with adding valgrind_stack_deregister), the
>> valgrind only show some
>> > "Invalid read/write" about __gf_mem_invalidate.
>> > The only reason here I think is the stack size (16K) of marker using.
>> >
>> > I have not used makecontext/swapcontext before, am i right?
>> > 1. without swapconext, the stack maybe (just an example)
>> >    --> io_stats-> quota-> marker-> io-threads ->.....
>> >
>> > 2. with swapcontext,
>> >    --> io_stats-> quota-> marker
>> >                swithto new stack -> io-threads
>> >
>> > After switchto new stack, the new stack size is 16K,
>> > Does it enough without io-threads?
>> >
>> > I don't know what's the behave of io-threads, does it call all
>> sub-xlator using the 16K ? or others?
>> >
>> >> It seems unlikely to me since we are allocating the stack from heap.
>> >>
>> >> Did you try a run with the valgrind instrumentation, without changing
>> stack size ?
>> >
>> > OK.
>> > The following valgrind-without-stack-change.log is test with adding
>> valgrind_stack_deregister
>> > (and without io-threads).
>> >
>> > thanks,
>> > Kinglong Mee
>> >
>> >> None of this explains the crash though.. We had seen a memory overrun
>> crash in same code path on netbsd earlier but did not follow up then.
>> >> Will look further into it.
>> >>
>> >>
>> >>
>> >> On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee <kinglongmee at gmail.com
>> <mailto:kinglongmee at gmail.com>> wrote:
>> >>
>> >>     Maybe it's my fault, I found valgrind can't parse context
>> switch(makecontext/swapcontext) by default.
>> >>     So, I test with the following patch (tells valgrind new stack by
>> VALGRIND_STACK_DEREGISTER).
>> >>     With it, only some "Invalid read/write" by __gf_mem_invalidate,
>> Does it right ??
>> >>     So, there is only one problem, if without io-threads, the stack
>> size is small for marker.
>> >>     Am I right?
>> >>
>> >>     Ps:
>> >>     valgrind-before.log is the log without the following patch, the
>> valgrind-after.log is with the patch.
>> >>
>> >>     ==35656== Invalid write of size 8
>> >>     ==35656==    at 0x4E8FFD4: __gf_mem_invalidate (mem-pool.c:278)
>> >>     ==35656==    by 0x4E90313: __gf_free (mem-pool.c:334)
>> >>     ==35656==    by 0x4EA4E5B: synctask_destroy (syncop.c:394)
>> >>     ==35656==    by 0x4EA4EDF: synctask_done (syncop.c:412)
>> >>     ==35656==    by 0x4EA58B3: synctask_switchto (syncop.c:673)
>> >>     ==35656==    by 0x4EA596B: syncenv_processor (syncop.c:704)
>> >>     ==35656==    by 0x60B2DC4: start_thread (in /usr/lib64/
>> libpthread-2.17.so <http://libpthread-2.17.so>)
>> >>     ==35656==    by 0x67A873C: clone (in /usr/lib64/libc-2.17.so <
>> http://libc-2.17.so>)
>> >>     ==35656==  Address 0x1b104931 is 2,068,017 bytes inside a block of
>> size 2,097,224 alloc'd
>> >>     ==35656==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
>> >>     ==35656==    by 0x4E8FA5E: __gf_calloc (mem-pool.c:117)
>> >>     ==35656==    by 0x4EA52F5: synctask_create (syncop.c:500)
>> >>     ==35656==    by 0x4EA55AE: synctask_new1 (syncop.c:576)
>> >>     ==35656==    by 0x143AE0D7: mq_synctask1 (marker-quota.c:1078)
>> >>     ==35656==    by 0x143AE199: mq_synctask (marker-quota.c:1097)
>> >>     ==35656==    by 0x143AE6F6: _mq_create_xattrs_txn
>> (marker-quota.c:1236)
>> >>     ==35656==    by 0x143AE82D: mq_create_xattrs_txn
>> (marker-quota.c:1253)
>> >>     ==35656==    by 0x143B0DCB: mq_inspect_directory_xattr
>> (marker-quota.c:2027)
>> >>     ==35656==    by 0x143B13A8: mq_xattr_state (marker-quota.c:2117)
>> >>     ==35656==    by 0x143A6E80: marker_lookup_cbk (marker.c:2961)
>> >>     ==35656==    by 0x141811E0: up_lookup_cbk (upcall.c:753)
>> >>
>> >>     ----------------------- valgrind ------------------------------
>> ------------
>> >>
>> >>     Don't forget install valgrind-devel.
>> >>
>> >>     diff --git a/libglusterfs/src/syncop.c b/libglusterfs/src/syncop.c
>> >>     index 00a9b57..97b1de1 100644
>> >>     --- a/libglusterfs/src/syncop.c
>> >>     +++ b/libglusterfs/src/syncop.c
>> >>     @@ -10,6 +10,7 @@
>> >>
>> >>      #include "syncop.h"
>> >>      #include "libglusterfs-messages.h"
>> >>     +#include <valgrind/valgrind.h>
>> >>
>> >>      int
>> >>      syncopctx_setfsuid (void *uid)
>> >>     @@ -388,6 +389,8 @@ synctask_destroy (struct synctask *task)
>> >>              if (!task)
>> >>                      return;
>> >>
>> >>     +VALGRIND_STACK_DEREGISTER(task->valgrind_ret);
>> >>     +
>> >>              GF_FREE (task->stack);
>> >>
>> >>              if (task->opframe)
>> >>     @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, size_t
>> stacksize, sync
>> >>
>> >>              newtask->ctx.uc_stack.ss_sp   = newtask->stack;
>> >>
>> >>     +       newtask->valgrind_ret = VALGRIND_STACK_REGISTER(newtask->stack,
>> newtask-
>> >>     +
>> >>              makecontext (&newtask->ctx, (void (*)(void))
>> synctask_wrap, 2, newtask)
>> >>
>> >>              newtask->state = SYNCTASK_INIT;
>> >>     diff --git a/libglusterfs/src/syncop.h b/libglusterfs/src/syncop.h
>> >>     index c2387e6..247325b 100644
>> >>     --- a/libglusterfs/src/syncop.h
>> >>     +++ b/libglusterfs/src/syncop.h
>> >>     @@ -63,6 +63,7 @@ struct synctask {
>> >>              int                 woken;
>> >>              int                 slept;
>> >>              int                 ret;
>> >>     +       int                 valgrind_ret;
>> >>
>> >>              uid_t               uid;
>> >>              gid_t               gid;
>> >>     diff --git a/xlators/features/marker/src/marker-quota.c
>> b/xlators/features/marke
>> >>     index 902b8e5..f3d2507 100644
>> >>     --- a/xlators/features/marker/src/marker-quota.c
>> >>     +++ b/xlators/features/marker/src/marker-quota.c
>> >>     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, synctask_fn_t
>> task, gf_boole
>> >>              }
>> >>
>> >>              if (spawn) {
>> >>     -                ret = synctask_new1 (this->ctx->env, 1024 * 16,
>> task,
>> >>     +                ret = synctask_new1 (this->ctx->env, 0, task,
>> >>                                            mq_synctask_cleanup, NULL,
>> args);
>> >>                      if (ret) {
>> >>                              gf_log (this->name, GF_LOG_ERROR, "Failed
>> to spawn "
>> >>
>> >>
>> >>     On 6/8/2017 19:02, Sanoj Unnikrishnan wrote:
>> >>     > I would still be worried about the Invalid read/write. IMO
>> whether an illegal access causes a crash depends on whether the page is
>> currently mapped.
>> >>     > So, it could so happen that there is a use after free / use
>> outside of bounds happening in the code  and  it turns out that this
>> location gets mapped in a different (unmapped) page when IO threads is not
>> loaded.
>> >>     >
>> >>     > Could you please share the valgrind logs as well.
>> >>     >
>> >>     > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee <
>> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com> <mailto:
>> kinglongmee at gmail.com <mailto:kinglongmee at gmail.com>>> wrote:
>> >>     >
>> >>     >     After deleting io-threads from the vols, quota operates
>> (list/set/modify) lets glusterfsd crash.
>> >>     >     I use it at CentOS 7 (CentOS Linux release 7.3.1611) with
>> glusterfs 3.8.12.
>> >>     >     It seems the stack corrupt, when testing with the following
>> diff, glusterfsd runs correctly.
>> >>     >
>> >>     >     There are two questions as,
>> >>     >     1. When using valgrind, it shows there are many "Invalid
>> read/write" when with io-threads.
>> >>     >        Why glusterfsd runs correctly with io-threads? but crash
>> without io-threads?
>> >>     >
>> >>     >     2. With the following diff, valgrind also shows many
>> "Invalid read/write" when without io-threads?
>> >>     >        but no any crash.
>> >>     >
>> >>     >     Any comments are welcome.
>> >>     >
>> >>     >     Revert http://review.gluster.org/11499 <
>> http://review.gluster.org/11499> <http://review.gluster.org/11499 <
>> http://review.gluster.org/11499>> seems better than the diff.
>> >>     >
>> >>     >     diff --git a/xlators/features/marker/src/marker-quota.c
>> b/xlators/features/marke
>> >>     >     index 902b8e5..f3d2507 100644
>> >>     >     --- a/xlators/features/marker/src/marker-quota.c
>> >>     >     +++ b/xlators/features/marker/src/marker-quota.c
>> >>     >     @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this,
>> synctask_fn_t task, gf_boole
>> >>     >              }
>> >>     >
>> >>     >              if (spawn) {
>> >>     >     -                ret = synctask_new1 (this->ctx->env, 1024 *
>> 16, task,
>> >>     >     +                ret = synctask_new1 (this->ctx->env, 0,
>> task,
>> >>     >                                            mq_synctask_cleanup,
>> NULL, args);
>> >>     >                      if (ret) {
>> >>     >                              gf_log (this->name, GF_LOG_ERROR,
>> "Failed to spawn "
>> >>     >
>> >>     >     -----------------------------------test steps
>> ----------------------------------
>> >>     >     1. gluster volume create gvtest node1:/test/ node2:/test/
>> >>     >     2. gluster volume start gvtest
>> >>     >     3. gluster volume quota enable gvtest
>> >>     >
>> >>     >     4. "deletes io-threads from all vols"
>> >>     >     5. reboot node1 and node2.
>> >>     >     6. sh quota-set.sh
>> >>     >
>> >>     >     # cat quota-set.sh
>> >>     >     gluster volume quota gvtest list
>> >>     >     gluster volume quota gvtest limit-usage / 10GB
>> >>     >     gluster volume quota gvtest limit-usage /1234 1GB
>> >>     >     gluster volume quota gvtest limit-usage /hello 1GB
>> >>     >     gluster volume quota gvtest limit-usage /test 1GB
>> >>     >     gluster volume quota gvtest limit-usage /xyz 1GB
>> >>     >     gluster volume quota gvtest list
>> >>     >     gluster volume quota gvtest remove /hello
>> >>     >     gluster volume quota gvtest remove /test
>> >>     >     gluster volume quota gvtest list
>> >>     >     gluster volume quota gvtest limit-usage /test 1GB
>> >>     >     gluster volume quota gvtest remove /xyz
>> >>     >     gluster volume quota gvtest list
>> >>     >
>> >>     >     -----------------------glusterfsd crash without the
>> diff--------------------------------
>> >>     >
>> >>     >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x
>> f5)[0x7f6e1e950af1]
>> >>     >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7f
>> 6e1e956943]
>> >>     >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x4
>> 09c83]
>> >>     >     /lib64/libc.so.6(+0x35250)[0x7f6e1d025250]
>> >>     >     /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7]
>> >>     >     /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8]
>> >>     >     /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07]
>> >>     >     /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5]
>> >>     >     /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6]
>> >>     >     /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[0x7f6e1e9
>> 81327]
>> >>     >     /usr/local/lib/libglusterfs.so.0(synctask_destroy+0x82)[0x7
>> f6e1e995c20]
>> >>     >     /usr/local/lib/libglusterfs.so.0(synctask_done+0x25)[0x7f6e
>> 1e995c47]
>> >>     >     /usr/local/lib/libglusterfs.so.0(synctask_switchto+0xcf)[0x
>> 7f6e1e996585]
>> >>     >     /usr/local/lib/libglusterfs.so.0(syncenv_processor+0x60)[0x
>> 7f6e1e99663d]
>> >>     >     /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5]
>> >>     >     /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d]
>> >>     >
>> >>     >     or
>> >>     >
>> >>     >     package-string: glusterfs 3.8.12
>> >>     >     /usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x
>> f5)[0x7fa15e623af1]
>> >>     >     /usr/local/lib/libglusterfs.so.0(gf_print_trace+0x21f)[0x7f
>> a15e629943]
>> >>     >     /usr/local/sbin/glusterfsd(glusterfsd_print_trace+0x1f)[0x4
>> 09c83]
>> >>     >     /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250]
>> >>     >     /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7]
>> >>     >     /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8]
>> >>     >     /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07]
>> >>     >     /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d]
>> >>     >     /lib64/libc.so.6(__libc_calloc+0xb4)[0x7fa15cd43a14]
>> >>     >     /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[0x7fa15e
>> 653a5f]
>> >>     >     /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[0x7fa15e6
>> 5875a]
>> >>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0x
>> a98c)[0x7fa153a8398c]
>> >>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0x
>> acbc)[0x7fa153a83cbc]
>> >>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0x
>> ad10)[0x7fa153a83d10]
>> >>     >     /usr/local/lib/glusterfs/3.8.12/rpc-transport/socket.so(+0x
>> b2a7)[0x7fa153a842a7]
>> >>     >     /usr/local/lib/libglusterfs.so.0(+0x97ea9)[0x7fa15e68eea9]
>> >>     >     /usr/local/lib/libglusterfs.so.0(+0x982c6)[0x7fa15e68f2c6]
>> >>     >     /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5]
>> >>     >     /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d]
>> >>     >
>> >>     >     _______________________________________________
>> >>     >     Gluster-devel mailing list
>> >>     >     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>> <mailto:Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>>
>> >>     >     http://lists.gluster.org/mailman/listinfo/gluster-devel <
>> http://lists.gluster.org/mailman/listinfo/gluster-devel> <
>> http://lists.gluster.org/mailman/listinfo/gluster-devel <
>> http://lists.gluster.org/mailman/listinfo/gluster-devel>>
>> >>     >
>> >>     >
>> >>
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170628/dc908052/attachment-0001.html>