[Gluster-devel] NetBSD regressions, memory corruption

Venky Shankar yknev.shankar at gmail.com
Tue Mar 24 17:35:39 UTC 2015


On Mar 24, 2015 10:48 PM, "Emmanuel Dreyfus" <manu at netbsd.org> wrote:
>
> Hi
>
> The merge of http://review.gluster.org/9953/ removed a few crashes from
> NetBSD regression tests, but the thing remains uterly broken since the
> merge of http://review.gluster.org/9708/ though I cannot tell if I have
> bugs leftover form this commit or if I face new problems.
>
> Here are the known problem so far:
>
> 1) This needs to be merged:
> http://review.gluster.org/9831
> http://review.gluster.org/9944
>
> 2) I still experience memory corruption, which usually crash glsuterfsd
> because some pointer waas replaced by value 0x3. This strikes on iobref
> most of the time, but it can happens elsewhere.
>
> I would be glad if someone could help here. On nbslave70:/autobuild I
> added code to check for iobref/iobuf sanity at random place (by calling
> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
> but I have not been able to spot the source of the problem yet.

I'll take a look at this tomorrow.

>
> The weird thing is that memory seems to always be overwritten by the
> same values, and magic 0xcafebabe number before the buffer is preserved.
> Here is an example: where iobref->iobrefs = 0xbb11a458
> 0xbb11a44c:     0xcafebabe      0x00000000      0x00000000      0x00000003
> 0xbb11a45c:     0x00000003      0x00000008      0x00000003      0x0000000c
> 0xbb11a46c:     0x00000003      0x0000000e      0x00000003      0x00000010
> 0xbb11a47c:     0x00000003      0x00000009      0x00000003      0x0000000d
> 0xbb11a48c:     0x00000003      0x00000015      0x00000003      0x00000016
> 0xbb11a49c:     0x00000003      0x00000032      0x00000034      0xbb1e2018
> 0xbb11a4ac:     0xcafebabe      0x00000000      0x00000000      0xbb11a5d8
>
>
> Additionnaly, there are two workarounds I had to make for crashes
> that happen sometime:
> 3) I had to make this change (not yet posted on gerrit) to avoid crashing
> because op = GD_OP_NONE. Things seems to go fins without the test.
> a cause or a symptom:
>
> diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c
b/xlators/mgmt/glusterd/src/glusterd-utils.c
> index 02d2cfb..c06959c 100644
> --- a/xlators/mgmt/glusterd/src/glusterd-utils.c
> +++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
> @@ -8301,15 +8301,12 @@ out:
>  int
>  glusterd_volume_heal_use_rsp_dict (dict_t *aggr, dict_t *rsp_dict)
>  {
>          int            ret      = 0;
>          dict_t        *ctx_dict = NULL;
> -        glusterd_op_t  op       = GD_OP_NONE;
> +        glusterd_op_t  op       = GD_OP_HEAL_VOLUME;
>
>          GF_ASSERT (rsp_dict);
>
> -        op = glusterd_op_get_op ();
> -        GF_ASSERT (GD_OP_HEAL_VOLUME == op);
> -
>          if (aggr) {
>                  ctx_dict = aggr;
>
>
> 4) Here I crash because this->private = NULL, and here is a
> workaround:
>
> diff --git a/xlators/storage/posix/src/posix.c
b/xlators/storage/posix/src/posix.c
> index ae08adc..3918e07 100644
> --- a/xlators/storage/posix/src/posix.c
> +++ b/xlators/storage/posix/src/posix.c
> @@ -913,6 +913,7 @@ posix_opendir (call_frame_t *frame, xlator_t *this,
>
>          VALIDATE_OR_GOTO (frame, out);
>          VALIDATE_OR_GOTO (this, out);
> +        VALIDATE_OR_GOTO (this->private, out);
>          VALIDATE_OR_GOTO (loc, out);
>          VALIDATE_OR_GOTO (fd, out);
>
>
>
> 4)
>
>
> --
> Emmanuel Dreyfus
> manu at netbsd.org
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150324/e6a821fd/attachment-0001.html>


More information about the Gluster-devel mailing list