[Gluster-devel] Proposal to change locking in data-self-heal

Tue May 21 13:10:18 UTC 2013

Hi,
    This idea is proposed by Brian Foster as a solution to several hangs we faced during self-heal + truncate situation or two self-heals triggered on the same file situation.

Problem:
Scenario-1:
At the moment when data-self-heal is triggered on a file, until the self-heal is complete, extra full file-locks will be blocked. Because of this, truncate fops hang until the self-heal is complete.

Scenario-2:
While a self-heal is in progress if another self-heal is triggered on the same file then it will be put into blocked queue. Because of the presence of this blocked lock, further locks by writes on the file will be moved to blocked queue as well.

Both these scenarios lead to user perceivable interim hangs.

Little bit of background:
At the moment the data-self-heal acquires the locks in following pattern. It takes full file lock then gets xattrs on files on both replicas. Decides sources/sinks based on the xattrs. Now it acquires lock from 0-128k then unlocks the full file lock. Syncs 0-128k range from source to sink now acquires lock 128k+1 till 256k then unlocks 0-128k, syncs 128k+1 till 256k block... so on finally it takes full file lock again then unlocks the final small range block. It decrements pending counts and then unlocks the full file lock.
     This pattern of locks is chosen to avoid more than 1 self-heal to be in progress. BUT if another self-heal tries to take a full file lock while a self-heal is already in progress it will be put in blocked queue, further inodelks from writes by the application will also be put in blocked queue because of the way locks xlator grants inodelks. Here is the code:

xlators/features/locks/src/inodelk.c - line 225
  0         if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {                                                                                                      
  1                 ret = -EAGAIN;                                                   
  2                 if (can_block == 0)                                              
  3                         goto out;                                                
  4                                                                                  
  5                 gettimeofday (&lock->blkd_time, NULL);                           
  6                 list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);    

Solution:
Since we want to prevent two parallel self-heals. We let them compete in a separate "domain". Lets call the domain on which the locks have been taken on in previous approach as "data-domain".

In the new approach When a self-heal is triggered it
acquires a full lock in the new domain "self-heal-domain".
    After this it performs data-self-heal using the locks in "data-domain" in the following manner:
    Acquire full file lock and get xattrs on file and decide source/sinks unlock full file lock.
    acquire lock with range 0 - 128k, sync the data from source to sinks in range 0 - 128k unlock 0 - 128k lock.
    acquire lock with range 128k+1 - 256k, sync the data from source to sinks in range 128k+1 - 256k, unlock 128k+1 - 256k lock.
    .....
    until the end of file is reached do this.
    acquire full file lock and decrement the pending counts then unlock the full file lock.
unlock the full file lock in "self-heal-domain"

scenario-1 won't happen because there exists a chance for it to acquire truncate's full file lock after any 128k range sync happens.
Scenario-2 won't happen because extra self-heals that are launched on the same file will be blocked in self-heal-domain so the data-path's locks are not affected by this.

Let me know if you see any problems/suggestions with this approach.

Pranith.