[Gluster-devel] NetBSD regressions, memory corruption

Niels de Vos ndevos at redhat.com
Wed Mar 25 10:58:40 UTC 2015


Ai, top posting, this makes it really difficult to follow the email if
you have not read the first parts :-/ Please remember to inline or
bottom post when replying.

On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote:
> looks like the iobref (and the iobuf) was allocated in protocol/server..
> 
> (gdb) x/16x (ie->ie_iobref->iobrefs - 8)
> 0xbb11a438:     0xbb18ba80      0x00000001      0x00000068      0x00000040
> 0xbb11a448:     0xbb1e2018      0xcafebabe      0x00000000      0x00000000
> 0xbb11a458:     0x00000003      0x00000003      0x00000008      0x00000003
> 0xbb11a468:     0x0000000c      0x00000003      0x0000000e      0x00000003
> 
> 8 bytes before the magic header (0xcafebabe) lives the xlator ("this")
> that invoked GF_MALLOC. Here it's:
> 
> (gdb) p *(xlator_t *)0xbb1e2018
> $9 = {name = 0xbb1dbb08 "patchy-server", type = 0xbb1dbb38
> "protocol/server", next = 0xbb1e1018, prev = 0x0, parents = 0x0,
>   children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000,
> fops = 0xb9adf0e0 <fops>, cbks = 0xb9adc8cc <cbks>,
>   dumpops = 0xb9ade460 <dumpops>, volume_options = {next = 0xbb1dbb68,
> prev = 0xbb1dbbf8}, fini = 0xb9ab539d <fini>,
>   init = 0xb9ab48a5 <init>, reconfigure = 0xb9ab418c <reconfigure>,
> mem_acct_init = 0xb9ab3cb1 <mem_acct_init>,
>   notify = 0xb9ab53a3 <notify>, loglevel = GF_LOG_NONE, latencies =
> {{min = 0, max = 0, total = 0, std = 0, mean = 0,
>       count = 0} <repeats 50 times>}, history = 0x0, ctx = 0xbb109000,
> graph = 0xbb1c30f8, itable = 0x0,
>   init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct =
> {num_types = 144, rec = 0xbb1c6000}, winds = 0,
>   switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}
> 
> looking into it more. if the above strikes a bell to someone, let us know.

Going by the output from gdb above and the below layout:

    $ printf 'type=%d\nsize=%d\n' 0x00000068 0x00000040
    type=104
    size=64

This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is
an enum for the mem-type and libglusterfs/src/mem-types.h points to
gf_common_mt_iobrefs. There is only one function that uses
gf_common_mt_iobrefs, which is iobref_new().

protocol/server calls iobref_new() only once directly (there could be
some other indirect calls too) in server_submit_reply().

I do not quickly see how the issue can happen with the analyzed data in
this email. Possibly an allocation before (memory address wise) this
went awry and caused the wreckage. We may need to follow these
diagnostic steps back upwards and try to find the first occurrence where
0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d.

That's the only idea I have for now, but I'll keep thinking of something
that could make this easier.

Note: the iobref structure is used really a lot, this makes it a likely
structure to blow away other structures when something else frees some
memory, but wants to use it afterwards. I think a use-after-free could
be one cause for this.

Niels

> 
> -venky
> 
> On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos <ndevos at redhat.com> wrote:
> > On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote:
> >> Hi
> >>
> >> The merge of http://review.gluster.org/9953/ removed a few crashes from
> >> NetBSD regression tests, but the thing remains uterly broken since the
> >> merge of http://review.gluster.org/9708/ though I cannot tell if I have
> >> bugs leftover form this commit or if I face new problems.
> >>
> >> Here are the known problem so far:
> >
> > ...snip! I'll only give some info to your 2nd point.
> >
> >> 2) I still experience memory corruption, which usually crash glsuterfsd
> >> because some pointer waas replaced by value 0x3. This strikes on iobref
> >> most of the time, but it can happens elsewhere.
> >>
> >> I would be glad if someone could help here. On nbslave70:/autobuild I
> >> added code to check for iobref/iobuf sanity at random place (by calling
> >> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
> >> but I have not been able to spot the source of the problem yet.
> >>
> >> The weird thing is that memory seems to always be overwritten by the
> >> same values, and magic 0xcafebabe number before the buffer is preserved.
> >> Here is an example: where iobref->iobrefs = 0xbb11a458
> >> 0xbb11a44c:     0xcafebabe      0x00000000      0x00000000      0x00000003
> >> 0xbb11a45c:     0x00000003      0x00000008      0x00000003      0x0000000c
> >> 0xbb11a46c:     0x00000003      0x0000000e      0x00000003      0x00000010
> >> 0xbb11a47c:     0x00000003      0x00000009      0x00000003      0x0000000d
> >> 0xbb11a48c:     0x00000003      0x00000015      0x00000003      0x00000016
> >> 0xbb11a49c:     0x00000003      0x00000032      0x00000034      0xbb1e2018
> >> 0xbb11a4ac:     0xcafebabe      0x00000000      0x00000000      0xbb11a5d8
> >
> > Recently I was looking into something that involved some more
> > understanding of GF_MALLOC(). I did not really continue with it becase
> > other things got a higher priority. But, maybe this layout helps you a
> > little:
> >
> >      :                      :
> >      :                      :
> >      +----------------------+
> >      | GF_MEM_TRAILER_MAGIC |
> >      +----------------------+
> >      |                      |
> >      |         ...          |
> >      |                      |
> >      +----------------------+
> >      |       8 bytes        |
> >      +----------------------+
> >      | GF_MEM_HEADER_MAGIC  |
> >      +----------------------+
> >      |      *xlator_t       |
> >      +----------------------+
> >      |        size          |
> >      +----------------------+
> >      |        type          |
> >      +----------------------+
> >      :                      :
> >      :                      :
> >
> >      #define GF_MEM_HEADER_MAGIC  0xCAFEBABE
> >      #define GF_MEM_TRAILER_MAGIC 0xBAADF00D
> >
> >
> > Because there is no 0xbaadfood in your memory dump, I would assume that
> > the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is
> > a left over from a previous allocation.
> >
> > You could try to run a test with more strict memory enforcing. All the
> > GF_ASSERT() calls will actually call abort() in that case, and it may
> > make things a little easier to debug. You would pass --enable-debug to
> > the configure commandline:
> >
> >     $ ./configure --enable-debug
> >
> > I hope that we will be able to setup scheduled automated regression
> > tests with --enable-debug build binaries. It may be helpful to catch
> > unintended NULL usage a little earlier.
> >
> > HTH,
> > Niels
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list