[Gluster-devel] add-brick crashes client

Emmanuel Dreyfus manu at netbsd.org
Fri Aug 3 08:53:05 UTC 2012


On Fri, Aug 03, 2012 at 05:13:02AM +0000, Emmanuel Dreyfus wrote:
> It seems there is a race condition here. Someone knowledgable can
> confirm?

I tried this. It does not crash anymore, but the volume gets broken
with lookups returning EINVAL  (log below), it's therefore probably the
wrong way, but hints are welcome.

--- syncop.c.orig       2012-08-03 08:02:35.000000000 +0200
+++ syncop.c    2012-08-03 10:43:28.000000000 +0200
@@ -116,8 +116,10 @@
         /* Do not trust the pointer received. It may be
            wrong and can lead to crashes. */
 
         task = synctask_get ();
+       assert(task != NULL);
+
         task->ret = task->syncfn (task->opaque);
        if (task->synccbk)
                task->synccbk (task->ret, task->frame, task->opaque);
 
@@ -211,8 +213,14 @@
 
         newtask->ctx.uc_stack.ss_sp   = newtask->stack;
         newtask->ctx.uc_stack.ss_size = env->stacksize;
 
+       /* 
+        * synctask_wrap does not trust its argument, and 
+        * uses syntask_get()
+        */
+       synctask_set (newtask);
+
         makecontext (&newtask->ctx, (void *) synctask_wrap, 2, newtask);
 
        newtask->state = SYNCTASK_INIT;


[2012-08-03 10:46:03.709177] E [afr-common.c:3664:afr_notify] 0-pfs-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-08-03 10:46:03.825505] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066
[2012-08-03 10:46:03.825652] E [dht-common.c:1372:dht_lookup] 1-pfs-dht: Failed to get hashed subvol for /manu
[2012-08-03 10:46:03.826315] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12944: LOOKUP() /manu => -1 (Invalid argument)
[2012-08-03 10:46:03.827107] W [dht-layout.c:186:dht_layout_search] 1-pfs-dht: no subvolume for hash (value) = 4177819066

-- 
Emmanuel Dreyfus
manu at netbsd.org




More information about the Gluster-devel mailing list