[Gluster-devel] NetBSD regressions, memory corruption

Niels de Vos ndevos at redhat.com
Tue Mar 24 17:58:12 UTC 2015


On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote:
> Hi
> 
> The merge of http://review.gluster.org/9953/ removed a few crashes from
> NetBSD regression tests, but the thing remains uterly broken since the
> merge of http://review.gluster.org/9708/ though I cannot tell if I have
> bugs leftover form this commit or if I face new problems.
> 
> Here are the known problem so far:

...snip! I'll only give some info to your 2nd point.

> 2) I still experience memory corruption, which usually crash glsuterfsd 
> because some pointer waas replaced by value 0x3. This strikes on iobref
> most of the time, but it can happens elsewhere.
> 
> I would be glad if someone could help here. On nbslave70:/autobuild I 
> added code to check for iobref/iobuf sanity at random place (by calling
> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
> but I have not been able to spot the source of the problem yet.
> 
> The weird thing is that memory seems to always be overwritten by the 
> same values, and magic 0xcafebabe number before the buffer is preserved. 
> Here is an example: where iobref->iobrefs = 0xbb11a458 
> 0xbb11a44c:     0xcafebabe      0x00000000      0x00000000      0x00000003
> 0xbb11a45c:     0x00000003      0x00000008      0x00000003      0x0000000c
> 0xbb11a46c:     0x00000003      0x0000000e      0x00000003      0x00000010
> 0xbb11a47c:     0x00000003      0x00000009      0x00000003      0x0000000d
> 0xbb11a48c:     0x00000003      0x00000015      0x00000003      0x00000016
> 0xbb11a49c:     0x00000003      0x00000032      0x00000034      0xbb1e2018
> 0xbb11a4ac:     0xcafebabe      0x00000000      0x00000000      0xbb11a5d8

Recently I was looking into something that involved some more
understanding of GF_MALLOC(). I did not really continue with it becase
other things got a higher priority. But, maybe this layout helps you a
little:

     :                      :
     :                      :
     +----------------------+
     | GF_MEM_TRAILER_MAGIC |
     +----------------------+
     |                      |
     |         ...          |
     |                      |
     +----------------------+
     |       8 bytes        |
     +----------------------+
     | GF_MEM_HEADER_MAGIC  |
     +----------------------+
     |      *xlator_t       |
     +----------------------+
     |        size          |
     +----------------------+
     |        type          |
     +----------------------+
     :                      :
     :                      :
     
     #define GF_MEM_HEADER_MAGIC  0xCAFEBABE
     #define GF_MEM_TRAILER_MAGIC 0xBAADF00D


Because there is no 0xbaadfood in your memory dump, I would assume that
the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is
a left over from a previous allocation.

You could try to run a test with more strict memory enforcing. All the
GF_ASSERT() calls will actually call abort() in that case, and it may
make things a little easier to debug. You would pass --enable-debug to
the configure commandline:

    $ ./configure --enable-debug

I hope that we will be able to setup scheduled automated regression
tests with --enable-debug build binaries. It may be helpful to catch
unintended NULL usage a little earlier.

HTH,
Niels


More information about the Gluster-devel mailing list