[Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

Mon Jan 25 05:36:09 UTC 2016

On Jan 25, 2016 08:12, "Pranith Kumar Karampuri" <pkarampu at redhat.com>
wrote:
>
>
>
> On 01/25/2016 02:17 AM, Richard Wareing wrote:
>>
>> Hello all,
>>
>> Just gave a talk at SCaLE 14x today and I mentioned our new locks
revocation feature which has had a significant impact on our GFS cluster
reliability.  As such I wanted to share the patch with the community, so
here's the bugzilla report:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>>
>> =====
>> Summary:
>> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster
instability and eventual complete unavailability due to failures in
releasing entry/inode locks in a timely manner.
>>
>> Classic symptoms on this are increased brick (and/or gNFSd) memory usage
due the high number of (lock request) frames piling up in the processes.
The failure-mode results in bricks eventually slowing down to a crawl due
to swapping, or OOMing due to complete memory exhaustion; during this
period the entire cluster can begin to fail.  End-users will experience
this as hangs on the filesystem, first in a specific region of the
file-system and ultimately the entire filesystem as the offending brick
begins to turn into a zombie (i.e. not quite dead, but not quite alive
either).
>>
>> Currently, these situations must be handled by an administrator
detecting & intervening via the "clear-locks" CLI command.  Unfortunately
this doesn't scale for large numbers of clusters, and it depends on the
correct (external) detection of the locks piling up (for which there is
little signal other than state dumps).
>>
>> This patch introduces two features to remedy this situation:
>>
>> 1. Monkey-unlocking - This is a feature targeted at developers (only!)
to help track down crashes due to stale locks, and prove the utility of he
lock revocation feature.  It does this by silently dropping 1% of unlock
requests; simulating bugs or mis-behaving clients.
>>
>> The feature is activated via:
>> features.locks-monkey-unlocking <on/off>
>>
>> You'll see the message
>> "[<timestamp>] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY
LOCKING (forcing stuck lock)!" ... in the logs indicating a request has
been dropped.
>>
>> 2. Lock revocation - Once enabled, this feature will revoke a
*contended*lock  (i.e. if nobody else asks for the lock, we will not revoke
it) either by the amount of time the lock has been held, how many other
lock requests are waiting on the lock to be freed, or some combination of
both.  Clients which are losing their locks will be notified by receiving
EAGAIN (send back to their callback function).
>>
>> The feature is activated via these options:
>> features.locks-revocation-secs <integer; 0 to disable>
>> features.locks-revocation-clear-all [on/off]
>> features.locks-revocation-max-blocked <integer>
>>
>> Recommended settings are: 1800 seconds for a time based timeout (give
clients the benefit of the doubt, or chose a max-blocked requires some
experimentation depending on your workload, but generally values of
hundreds to low thousands (it's normal for many ten's of locks to be taken
out when files are being written @ high throughput).
>
>
> I really like this feature. One question though, self-heal, rebalance
domain locks are active until self-heal/rebalance is complete which can
take more than 30 minutes if the files are in TBs. I will try to see what
we can do to handle these without increasing the revocation-secs too much.
May be we can come up with per domain revocation timeouts. Comments are
welcome.

[
    I've not gone through the design or the patch,
    hence this might be a shot in the air.
]

Maybe give clients a second (or more) chance to "refresh" their locks - in
the sense, when a lock is about to be revoked, notify the client which can
then call for a refresh to conform it's locks holding validity. This would
require some maintainance work on the client to keep track of locked
regions.

>
> Pranith
>>
>>
>> =====
>>
>> The patch supplied will patch clean the the v3.7.6 release tag, and
probably to any 3.7.x release & master (posix locks xlator is rarely
touched).
>>
>> Richard
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160125/ad3152cb/attachment.html>