[Gluster-devel] 3.4.0beta2 crash in conservative merge?

Emmanuel Dreyfus manu at netbsd.org
Sun Jun 2 04:53:56 UTC 2013


I am trying to figure out how the crash happens. We now local->fd is
valid at the begining of dht_migration_complete_check_task() since it is
dereferenced there aithout a hitch. Then it becomes NULL before the
function exits, which leads to a crash.

That suggests a race condition. I checked local->fd locking and it seems
fine. I therefore come to the conclusion that
dht_migration_complete_check_task() fails to hold a reference on
local->fd. I am now running tests with the change below. Does it makes
sense?

Is it possible that local->fd get unreferenced and freed from some other
thread between the time dht_migration_complete_check_task() is entered
and the time fd_ref() is called? 

--- xlators/cluster/dht/src/dht-helper.c.orig
+++ xlators/cluster/dht/src/dht-helper.c
 
         src_node = local->cached_subvol;
 
         if (!local->loc.inode && !local->fd)
-                goto out;
+                return -1;
+
+       if (!local->loc.inode)
+               fd_ref(local->fd);
 
         /* getxattr on cached_subvol for 'linkto' value */
         if (!local->loc.inode)
                 ret = syncop_fgetxattr (src_node, local->fd, &dict,
@@ -836,8 +839,10 @@
         }
 
         ret = 0;
 out:
+        if (!local->loc.inode)
+               fd_unref(local->fd);
 
         return ret;
 }
 


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org




More information about the Gluster-devel mailing list