[Gluster-devel] 3.4.0beta2 crash in conservative merge?
Emmanuel Dreyfus
manu at netbsd.org
Sun Jun 2 04:53:56 UTC 2013
I am trying to figure out how the crash happens. We now local->fd is
valid at the begining of dht_migration_complete_check_task() since it is
dereferenced there aithout a hitch. Then it becomes NULL before the
function exits, which leads to a crash.
That suggests a race condition. I checked local->fd locking and it seems
fine. I therefore come to the conclusion that
dht_migration_complete_check_task() fails to hold a reference on
local->fd. I am now running tests with the change below. Does it makes
sense?
Is it possible that local->fd get unreferenced and freed from some other
thread between the time dht_migration_complete_check_task() is entered
and the time fd_ref() is called?
--- xlators/cluster/dht/src/dht-helper.c.orig
+++ xlators/cluster/dht/src/dht-helper.c
src_node = local->cached_subvol;
if (!local->loc.inode && !local->fd)
- goto out;
+ return -1;
+
+ if (!local->loc.inode)
+ fd_ref(local->fd);
/* getxattr on cached_subvol for 'linkto' value */
if (!local->loc.inode)
ret = syncop_fgetxattr (src_node, local->fd, &dict,
@@ -836,8 +839,10 @@
}
ret = 0;
out:
+ if (!local->loc.inode)
+ fd_unref(local->fd);
return ret;
}
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list