[Gluster-devel] NetBSD regressions, memory corruption
Niels de Vos
ndevos at redhat.com
Tue Mar 24 17:58:12 UTC 2015
On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote:
> Hi
>
> The merge of http://review.gluster.org/9953/ removed a few crashes from
> NetBSD regression tests, but the thing remains uterly broken since the
> merge of http://review.gluster.org/9708/ though I cannot tell if I have
> bugs leftover form this commit or if I face new problems.
>
> Here are the known problem so far:
...snip! I'll only give some info to your 2nd point.
> 2) I still experience memory corruption, which usually crash glsuterfsd
> because some pointer waas replaced by value 0x3. This strikes on iobref
> most of the time, but it can happens elsewhere.
>
> I would be glad if someone could help here. On nbslave70:/autobuild I
> added code to check for iobref/iobuf sanity at random place (by calling
> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
> but I have not been able to spot the source of the problem yet.
>
> The weird thing is that memory seems to always be overwritten by the
> same values, and magic 0xcafebabe number before the buffer is preserved.
> Here is an example: where iobref->iobrefs = 0xbb11a458
> 0xbb11a44c: 0xcafebabe 0x00000000 0x00000000 0x00000003
> 0xbb11a45c: 0x00000003 0x00000008 0x00000003 0x0000000c
> 0xbb11a46c: 0x00000003 0x0000000e 0x00000003 0x00000010
> 0xbb11a47c: 0x00000003 0x00000009 0x00000003 0x0000000d
> 0xbb11a48c: 0x00000003 0x00000015 0x00000003 0x00000016
> 0xbb11a49c: 0x00000003 0x00000032 0x00000034 0xbb1e2018
> 0xbb11a4ac: 0xcafebabe 0x00000000 0x00000000 0xbb11a5d8
Recently I was looking into something that involved some more
understanding of GF_MALLOC(). I did not really continue with it becase
other things got a higher priority. But, maybe this layout helps you a
little:
: :
: :
+----------------------+
| GF_MEM_TRAILER_MAGIC |
+----------------------+
| |
| ... |
| |
+----------------------+
| 8 bytes |
+----------------------+
| GF_MEM_HEADER_MAGIC |
+----------------------+
| *xlator_t |
+----------------------+
| size |
+----------------------+
| type |
+----------------------+
: :
: :
#define GF_MEM_HEADER_MAGIC 0xCAFEBABE
#define GF_MEM_TRAILER_MAGIC 0xBAADF00D
Because there is no 0xbaadfood in your memory dump, I would assume that
the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is
a left over from a previous allocation.
You could try to run a test with more strict memory enforcing. All the
GF_ASSERT() calls will actually call abort() in that case, and it may
make things a little easier to debug. You would pass --enable-debug to
the configure commandline:
$ ./configure --enable-debug
I hope that we will be able to setup scheduled automated regression
tests with --enable-debug build binaries. It may be helpful to catch
unintended NULL usage a little earlier.
HTH,
Niels
More information about the Gluster-devel
mailing list