[Gluster-devel] NetBSD regressions, memory corruption

Venky Shankar yknev.shankar at gmail.com
Wed Mar 25 17:02:08 UTC 2015


On Wed, Mar 25, 2015 at 4:28 PM, Niels de Vos <ndevos at redhat.com> wrote:
> Ai, top posting, this makes it really difficult to follow the email if
> you have not read the first parts :-/ Please remember to inline or
> bottom post when replying.
>
> On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote:
>> looks like the iobref (and the iobuf) was allocated in protocol/server..
>>
>> (gdb) x/16x (ie->ie_iobref->iobrefs - 8)
>> 0xbb11a438:     0xbb18ba80      0x00000001      0x00000068      0x00000040
>> 0xbb11a448:     0xbb1e2018      0xcafebabe      0x00000000      0x00000000
>> 0xbb11a458:     0x00000003      0x00000003      0x00000008      0x00000003
>> 0xbb11a468:     0x0000000c      0x00000003      0x0000000e      0x00000003
>>
>> 8 bytes before the magic header (0xcafebabe) lives the xlator ("this")
>> that invoked GF_MALLOC. Here it's:
>>
>> (gdb) p *(xlator_t *)0xbb1e2018
>> $9 = {name = 0xbb1dbb08 "patchy-server", type = 0xbb1dbb38
>> "protocol/server", next = 0xbb1e1018, prev = 0x0, parents = 0x0,
>>   children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000,
>> fops = 0xb9adf0e0 <fops>, cbks = 0xb9adc8cc <cbks>,
>>   dumpops = 0xb9ade460 <dumpops>, volume_options = {next = 0xbb1dbb68,
>> prev = 0xbb1dbbf8}, fini = 0xb9ab539d <fini>,
>>   init = 0xb9ab48a5 <init>, reconfigure = 0xb9ab418c <reconfigure>,
>> mem_acct_init = 0xb9ab3cb1 <mem_acct_init>,
>>   notify = 0xb9ab53a3 <notify>, loglevel = GF_LOG_NONE, latencies =
>> {{min = 0, max = 0, total = 0, std = 0, mean = 0,
>>       count = 0} <repeats 50 times>}, history = 0x0, ctx = 0xbb109000,
>> graph = 0xbb1c30f8, itable = 0x0,
>>   init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct =
>> {num_types = 144, rec = 0xbb1c6000}, winds = 0,
>>   switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}
>>
>> looking into it more. if the above strikes a bell to someone, let us know.
>
> Going by the output from gdb above and the below layout:
>
>     $ printf 'type=%d\nsize=%d\n' 0x00000068 0x00000040
>     type=104
>     size=64
>
> This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is
> an enum for the mem-type and libglusterfs/src/mem-types.h points to
> gf_common_mt_iobrefs. There is only one function that uses
> gf_common_mt_iobrefs, which is iobref_new().
>
> protocol/server calls iobref_new() only once directly (there could be
> some other indirect calls too) in server_submit_reply().

yes, that's the only place in protocol/server than calls iobref_new().

>
> I do not quickly see how the issue can happen with the analyzed data in
> this email. Possibly an allocation before (memory address wise) this
> went awry and caused the wreckage. We may need to follow these
> diagnostic steps back upwards and try to find the first occurrence where
> 0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d.

What's interesting is the number of used iobufs is zero but ->iobrefs points
to a memory address (iobref_unref() iterates ->alloced times and frees anything
which isn't NULL). There's someone who put it there.

(gdb) p *ie->ie_iobref
$1 = {lock = {pts_magic = 2004287495, pts_spin = 0 '\000', pts_flags =
0}, ref = 1, iobrefs = 0xbb11a458, alloced = 16, used = 0}

Emmanuel,

Could I run some tests on nbslave70 (I plan to disable some
translators). Just running AFR test cases should trigger the segfault,
correct?

>
> That's the only idea I have for now, but I'll keep thinking of something
> that could make this easier.
>
> Note: the iobref structure is used really a lot, this makes it a likely
> structure to blow away other structures when something else frees some
> memory, but wants to use it afterwards. I think a use-after-free could
> be one cause for this.
>
> Niels
>
>>
>> -venky
>>
>> On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos <ndevos at redhat.com> wrote:
>> > On Tue, Mar 24, 2015 at 05:18:44PM +0000, Emmanuel Dreyfus wrote:
>> >> Hi
>> >>
>> >> The merge of http://review.gluster.org/9953/ removed a few crashes from
>> >> NetBSD regression tests, but the thing remains uterly broken since the
>> >> merge of http://review.gluster.org/9708/ though I cannot tell if I have
>> >> bugs leftover form this commit or if I face new problems.
>> >>
>> >> Here are the known problem so far:
>> >
>> > ...snip! I'll only give some info to your 2nd point.
>> >
>> >> 2) I still experience memory corruption, which usually crash glsuterfsd
>> >> because some pointer waas replaced by value 0x3. This strikes on iobref
>> >> most of the time, but it can happens elsewhere.
>> >>
>> >> I would be glad if someone could help here. On nbslave70:/autobuild I
>> >> added code to check for iobref/iobuf sanity at random place (by calling
>> >> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
>> >> but I have not been able to spot the source of the problem yet.
>> >>
>> >> The weird thing is that memory seems to always be overwritten by the
>> >> same values, and magic 0xcafebabe number before the buffer is preserved.
>> >> Here is an example: where iobref->iobrefs = 0xbb11a458
>> >> 0xbb11a44c:     0xcafebabe      0x00000000      0x00000000      0x00000003
>> >> 0xbb11a45c:     0x00000003      0x00000008      0x00000003      0x0000000c
>> >> 0xbb11a46c:     0x00000003      0x0000000e      0x00000003      0x00000010
>> >> 0xbb11a47c:     0x00000003      0x00000009      0x00000003      0x0000000d
>> >> 0xbb11a48c:     0x00000003      0x00000015      0x00000003      0x00000016
>> >> 0xbb11a49c:     0x00000003      0x00000032      0x00000034      0xbb1e2018
>> >> 0xbb11a4ac:     0xcafebabe      0x00000000      0x00000000      0xbb11a5d8
>> >
>> > Recently I was looking into something that involved some more
>> > understanding of GF_MALLOC(). I did not really continue with it becase
>> > other things got a higher priority. But, maybe this layout helps you a
>> > little:
>> >
>> >      :                      :
>> >      :                      :
>> >      +----------------------+
>> >      | GF_MEM_TRAILER_MAGIC |
>> >      +----------------------+
>> >      |                      |
>> >      |         ...          |
>> >      |                      |
>> >      +----------------------+
>> >      |       8 bytes        |
>> >      +----------------------+
>> >      | GF_MEM_HEADER_MAGIC  |
>> >      +----------------------+
>> >      |      *xlator_t       |
>> >      +----------------------+
>> >      |        size          |
>> >      +----------------------+
>> >      |        type          |
>> >      +----------------------+
>> >      :                      :
>> >      :                      :
>> >
>> >      #define GF_MEM_HEADER_MAGIC  0xCAFEBABE
>> >      #define GF_MEM_TRAILER_MAGIC 0xBAADF00D
>> >
>> >
>> > Because there is no 0xbaadfood in your memory dump, I would assume that
>> > the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is
>> > a left over from a previous allocation.
>> >
>> > You could try to run a test with more strict memory enforcing. All the
>> > GF_ASSERT() calls will actually call abort() in that case, and it may
>> > make things a little easier to debug. You would pass --enable-debug to
>> > the configure commandline:
>> >
>> >     $ ./configure --enable-debug
>> >
>> > I hope that we will be able to setup scheduled automated regression
>> > tests with --enable-debug build binaries. It may be helpful to catch
>> > unintended NULL usage a little earlier.
>> >
>> > HTH,
>> > Niels
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list