[Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

Wed Jan 27 05:49:06 UTC 2016

On Mon, Jan 25, 2016 at 10:39 AM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

>
>
> ----- Original Message -----
> > From: "Richard Wareing" <rwareing at fb.com>
> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > Cc: gluster-devel at gluster.org
> > Sent: Monday, January 25, 2016 8:17:11 AM
> > Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for
> features/locks xlator (v3.7.x)
> >
> > Yup per domain would be useful, the patch itself currently honors
> domains as
> > well. So locks in a different domains will not be touched during
> revocation.
> >
> > In our cases we actually prefer to pull the plug on SHD/DHT domains to
> ensure
> > clients do not hang, this is important for DHT self heals which cannot be
> > disabled via any option, we've found in most cases once we reap the lock
> > another properly behaving client comes along and completes the DHT heal
> > properly.
>
> Flushing waiting locks of DHT can affect application continuity too.
> Though locks requested by rebalance process can be flushed to certain
> extent without applications noticing any failures, there is no guarantee
> that locks requested in DHT_LAYOUT_HEAL_DOMAIN and DHT_FILE_MIGRATE_DOMAIN,
> are issued by only rebalance process.

I missed this point in my previous mail. Now I remember that we can use
frame->root->pid (being negative) to identify internal processes. Was this
the approach you followed to identify locks from rebalance process?

> These two domains are used for locks to synchronize among and between
> rebalance process(es) and client(s). So, there is equal probability that
> these locks might be requests from clients and hence application can see
> some file operations failing.
>
> In case of pulling plug on DHT_LAYOUT_HEAL_DOMAIN, dentry operations that
> depend on layout can fail. These operations can include create, link,
> unlink, symlink, mknod, mkdir, rename for files/directory within the
> directory on which lock request is failed.
>
> In case of pulling plug on DHT_FILE_MIGRATE_DOMAIN, rename of immediate
> subdirectories/files can fail.
>
>
> >
> > Richard
> >
> >
> > Sent from my iPhone
> >
> > On Jan 24, 2016, at 6:42 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com >
> > wrote:
> >
> >
> >
> >
> >
> >
> > On 01/25/2016 02:17 AM, Richard Wareing wrote:
> >
> >
> >
> > Hello all,
> >
> > Just gave a talk at SCaLE 14x today and I mentioned our new locks
> revocation
> > feature which has had a significant impact on our GFS cluster
> reliability.
> > As such I wanted to share the patch with the community, so here's the
> > bugzilla report:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1301401
> >
> > =====
> > Summary:
> > Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster
> instability
> > and eventual complete unavailability due to failures in releasing
> > entry/inode locks in a timely manner.
> >
> > Classic symptoms on this are increased brick (and/or gNFSd) memory usage
> due
> > the high number of (lock request) frames piling up in the processes. The
> > failure-mode results in bricks eventually slowing down to a crawl due to
> > swapping, or OOMing due to complete memory exhaustion; during this period
> > the entire cluster can begin to fail. End-users will experience this as
> > hangs on the filesystem, first in a specific region of the file-system
> and
> > ultimately the entire filesystem as the offending brick begins to turn
> into
> > a zombie (i.e. not quite dead, but not quite alive either).
> >
> > Currently, these situations must be handled by an administrator
> detecting &
> > intervening via the "clear-locks" CLI command. Unfortunately this doesn't
> > scale for large numbers of clusters, and it depends on the correct
> > (external) detection of the locks piling up (for which there is little
> > signal other than state dumps).
> >
> > This patch introduces two features to remedy this situation:
> >
> > 1. Monkey-unlocking - This is a feature targeted at developers (only!) to
> > help track down crashes due to stale locks, and prove the utility of he
> lock
> > revocation feature. It does this by silently dropping 1% of unlock
> requests;
> > simulating bugs or mis-behaving clients.
> >
> > The feature is activated via:
> > features.locks-monkey-unlocking <on/off>
> >
> > You'll see the message
> > "[<timestamp>] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY
> LOCKING
> > (forcing stuck lock)!" ... in the logs indicating a request has been
> > dropped.
> >
> > 2. Lock revocation - Once enabled, this feature will revoke a
> *contended*lock
> > (i.e. if nobody else asks for the lock, we will not revoke it) either by
> the
> > amount of time the lock has been held, how many other lock requests are
> > waiting on the lock to be freed, or some combination of both. Clients
> which
> > are losing their locks will be notified by receiving EAGAIN (send back to
> > their callback function).
> >
> > The feature is activated via these options:
> > features.locks-revocation-secs <integer; 0 to disable>
> > features.locks-revocation-clear-all [on/off]
> > features.locks-revocation-max-blocked <integer>
> >
> > Recommended settings are: 1800 seconds for a time based timeout (give
> clients
> > the benefit of the doubt, or chose a max-blocked requires some
> > experimentation depending on your workload, but generally values of
> hundreds
> > to low thousands (it's normal for many ten's of locks to be taken out
> when
> > files are being written @ high throughput).
> >
> > I really like this feature. One question though, self-heal, rebalance
> domain
> > locks are active until self-heal/rebalance is complete which can take
> more
> > than 30 minutes if the files are in TBs. I will try to see what we can
> do to
> > handle these without increasing the revocation-secs too much. May be we
> can
> > come up with per domain revocation timeouts. Comments are welcome.
> >
> > Pranith
> >
> >
> >
> >
> > =====
> >
> > The patch supplied will patch clean the the v3.7.6 release tag, and
> probably
> > to any 3.7.x release & master (posix locks xlator is rarely touched).
> >
> > Richard
> >
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160127/69d34bbc/attachment-0001.html>