[Gluster-devel] Bug in locks deletion upon disconnet

Fri Dec 13 10:14:34 UTC 2013

Hi,

There seems to be a bug in the ltable cleanup when disconnect is 
received in 3.5 and master. Its easy to reproduce. Just create a 
replicate volume. Start running dbench on the mount point, and do graph 
changes. The brick processes will crash while doing the ltable cleanup.

Pranith and me looked at the code and found the below issues.

static void
ltable_delete_locks (struct _lock_table *ltable)
{
         struct _locker *locker = NULL;
         struct _locker *tmp    = NULL;

         list_for_each_entry_safe (locker, tmp, 
&ltable->inodelk_lockers, lockers) {
             if (locker->fd)
                         pl_del_locker (ltable, locker->volume, 
&locker->loc,
                                        locker->fd, &locker->owner,
                                        GF_FOP_INODELK);
                 GF_FREE (locker->volume);
                 GF_FREE (locker);
         }

         list_for_each_entry_safe (locker, tmp, 
&ltable->entrylk_lockers, lockers) {
                 if (locker->fd)
                         pl_del_locker (ltable, locker->volume, 
&locker->loc,
                                        locker->fd, &locker->owner,
                                        GF_FOP_ENTRYLK);
                 GF_FREE (locker->volume);
                 GF_FREE (locker);
         }
         GF_FREE (ltable);
}

In above function, the list of indelks and entrylks is traversed and 
pl_del_locker is called for each lock with fd. But in pl_del_locker, we 
are collecting all the locks with same volume and owner sent as 
arguments and deleting them at once (that too without unlocking them). 
But for locks without fd, we are directly freeing up the objects without 
deleting them from the list (and without holding the ltable lock).

This is the bug logged for the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1042764

Regards,
Raghavendra Bhat