[Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

Pranith Kumar Karampuri pkarampu at redhat.com
Mon Jan 25 02:42:37 UTC 2016



On 01/25/2016 02:17 AM, Richard Wareing wrote:
> Hello all,
>
> Just gave a talk at SCaLE 14x today and I mentioned our new locks 
> revocation feature which has had a significant impact on our GFS 
> cluster reliability.  As such I wanted to share the patch with the 
> community, so here's the bugzilla report:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>
> =====
> Summary:
> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster 
> instability and eventual complete unavailability due to failures in 
> releasing entry/inode locks in a timely manner.
>
> Classic symptoms on this are increased brick (and/or gNFSd) memory 
> usage due the high number of (lock request) frames piling up in the 
> processes.  The failure-mode results in bricks eventually slowing down 
> to a crawl due to swapping, or OOMing due to complete memory 
> exhaustion; during this period the entire cluster can begin to fail. 
>  End-users will experience this as hangs on the filesystem, first in a 
> specific region of the file-system and ultimately the entire 
> filesystem as the offending brick begins to turn into a zombie (i.e. 
> not quite dead, but not quite alive either).
>
> Currently, these situations must be handled by an administrator 
> detecting & intervening via the "clear-locks" CLI command. 
>  Unfortunately this doesn't scale for large numbers of clusters, and 
> it depends on the correct (external) detection of the locks piling up 
> (for which there is little signal other than state dumps).
>
> This patch introduces two features to remedy this situation:
>
> 1. Monkey-unlocking - This is a feature targeted at developers (only!) 
> to help track down crashes due to stale locks, and prove the utility 
> of he lock revocation feature.  It does this by silently dropping 1% 
> of unlock requests; simulating bugs or mis-behaving clients.
>
> The feature is activated via:
> features.locks-monkey-unlocking <on/off>
>
> You'll see the message
> "[<timestamp>] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY 
> LOCKING (forcing stuck lock)!" ... in the logs indicating a request 
> has been dropped.
>
> 2. Lock revocation - Once enabled, this feature will revoke a 
> *contended*lock (i.e. if nobody else asks for the lock, we will not 
> revoke it)either by the amount of time the lock has been held, how 
> many other lock requests are waiting on the lock to be freed, or some 
> combination of both.  Clients which are losing their locks will be 
> notified by receiving EAGAIN (send back to their callback function).
>
> The feature is activated via these options:
> features.locks-revocation-secs <integer; 0 to disable>
> features.locks-revocation-clear-all [on/off]
> features.locks-revocation-max-blocked <integer>
>
> Recommended settings are: 1800 seconds for a time based timeout (give 
> clients the benefit of the doubt, or chose a max-blocked requires some 
> experimentation depending on your workload, but generally values of 
> hundreds to low thousands (it's normal for many ten's of locks to be 
> taken out when files are being written @ high throughput).

I really like this feature. One question though, self-heal, rebalance 
domain locks are active until self-heal/rebalance is complete which can 
take more than 30 minutes if the files are in TBs. I will try to see 
what we can do to handle these without increasing the revocation-secs 
too much. May be we can come up with per domain revocation timeouts. 
Comments are welcome.

Pranith
>
> =====
>
> The patch supplied will patch clean the the v3.7.6 release tag, and 
> probably to any 3.7.x release & master (posix locks xlator is rarely 
> touched).
>
> Richard
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160125/f14d7cbd/attachment.html>


More information about the Gluster-devel mailing list