[Gluster-devel] NetBSD regressions, memory corruption

Emmanuel Dreyfus manu at netbsd.org
Tue Mar 24 17:18:44 UTC 2015


Hi

The merge of http://review.gluster.org/9953/ removed a few crashes from
NetBSD regression tests, but the thing remains uterly broken since the
merge of http://review.gluster.org/9708/ though I cannot tell if I have
bugs leftover form this commit or if I face new problems.

Here are the known problem so far:

1) This needs to be merged: 
http://review.gluster.org/9831
http://review.gluster.org/9944

2) I still experience memory corruption, which usually crash glsuterfsd 
because some pointer waas replaced by value 0x3. This strikes on iobref
most of the time, but it can happens elsewhere.

I would be glad if someone could help here. On nbslave70:/autobuild I 
added code to check for iobref/iobuf sanity at random place (by calling
iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
but I have not been able to spot the source of the problem yet.

The weird thing is that memory seems to always be overwritten by the 
same values, and magic 0xcafebabe number before the buffer is preserved. 
Here is an example: where iobref->iobrefs = 0xbb11a458 
0xbb11a44c:     0xcafebabe      0x00000000      0x00000000      0x00000003
0xbb11a45c:     0x00000003      0x00000008      0x00000003      0x0000000c
0xbb11a46c:     0x00000003      0x0000000e      0x00000003      0x00000010
0xbb11a47c:     0x00000003      0x00000009      0x00000003      0x0000000d
0xbb11a48c:     0x00000003      0x00000015      0x00000003      0x00000016
0xbb11a49c:     0x00000003      0x00000032      0x00000034      0xbb1e2018
0xbb11a4ac:     0xcafebabe      0x00000000      0x00000000      0xbb11a5d8


Additionnaly, there are two workarounds I had to make for crashes
that happen sometime:
3) I had to make this change (not yet posted on gerrit) to avoid crashing
because op = GD_OP_NONE. Things seems to go fins without the test.
a cause or a symptom:

diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c
index 02d2cfb..c06959c 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -8301,15 +8301,12 @@ out:
 int
 glusterd_volume_heal_use_rsp_dict (dict_t *aggr, dict_t *rsp_dict)
 {
         int            ret      = 0;
         dict_t        *ctx_dict = NULL;
-        glusterd_op_t  op       = GD_OP_NONE;
+        glusterd_op_t  op       = GD_OP_HEAL_VOLUME;
 
         GF_ASSERT (rsp_dict);
 
-        op = glusterd_op_get_op ();
-        GF_ASSERT (GD_OP_HEAL_VOLUME == op);
-
         if (aggr) {
                 ctx_dict = aggr;
 

4) Here I crash because this->private = NULL, and here is a
workaround:

diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c
index ae08adc..3918e07 100644
--- a/xlators/storage/posix/src/posix.c
+++ b/xlators/storage/posix/src/posix.c
@@ -913,6 +913,7 @@ posix_opendir (call_frame_t *frame, xlator_t *this,
 
         VALIDATE_OR_GOTO (frame, out);
         VALIDATE_OR_GOTO (this, out);
+        VALIDATE_OR_GOTO (this->private, out);
         VALIDATE_OR_GOTO (loc, out);
         VALIDATE_OR_GOTO (fd, out);
 


4) 


-- 
Emmanuel Dreyfus
manu at netbsd.org


More information about the Gluster-devel mailing list