[Gluster-devel] Bug in locks deletion upon disconnet
Raghavendra Bhat
rabhat at redhat.com
Fri Dec 13 10:14:34 UTC 2013
Hi,
There seems to be a bug in the ltable cleanup when disconnect is
received in 3.5 and master. Its easy to reproduce. Just create a
replicate volume. Start running dbench on the mount point, and do graph
changes. The brick processes will crash while doing the ltable cleanup.
Pranith and me looked at the code and found the below issues.
static void
ltable_delete_locks (struct _lock_table *ltable)
{
struct _locker *locker = NULL;
struct _locker *tmp = NULL;
list_for_each_entry_safe (locker, tmp,
<able->inodelk_lockers, lockers) {
if (locker->fd)
pl_del_locker (ltable, locker->volume,
&locker->loc,
locker->fd, &locker->owner,
GF_FOP_INODELK);
GF_FREE (locker->volume);
GF_FREE (locker);
}
list_for_each_entry_safe (locker, tmp,
<able->entrylk_lockers, lockers) {
if (locker->fd)
pl_del_locker (ltable, locker->volume,
&locker->loc,
locker->fd, &locker->owner,
GF_FOP_ENTRYLK);
GF_FREE (locker->volume);
GF_FREE (locker);
}
GF_FREE (ltable);
}
In above function, the list of indelks and entrylks is traversed and
pl_del_locker is called for each lock with fd. But in pl_del_locker, we
are collecting all the locks with same volume and owner sent as
arguments and deleting them at once (that too without unlocking them).
But for locks without fd, we are directly freeing up the objects without
deleting them from the list (and without holding the ltable lock).
This is the bug logged for the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1042764
Regards,
Raghavendra Bhat
More information about the Gluster-devel
mailing list