[Gluster-devel] Improvements in Quota Translator

Tue Apr 9 20:03:02 UTC 2013

On 04/09/2013 09:39 AM, Varun Shastry wrote:
> Hi Everyone,
> 
> As gluster quota was facing some issues in its functionality, its 
> required to make it fool-proof, robust and reliable. So, below are
> the some of the major problems we are facing and the modifications
> to overcome the same.
> 
> Current implementation * Client side implementation of quota - Not
> secure - Increased traffic in updating the ctx - Relying on xattrs
> updation through lookup calls * Problem with NFS mount - lack of
> lookups (handling through 'file handles')
> 
> So, the new design is proposed,
> 
> * Two level of quota implementation soft and hard quota, similar to
> the XFS's quota, is introduced. A message is logged on reaching soft
> quota and no more writes allowed after hard limit.

Not only should it be similar to XFS's quota, but we should actually be
able to have XFS do the enforcement if the user so chooses.  Ditto for
other local filesystems with similar-enough quota functionality.  In
those cases we'd be there only to help manage the local FS.

> * Quota is moved to server-side. Server side implementation removes
> the client dependability for specific calls and secures the quota
> from mounting with modified volfile.

Absolutely agree that this is required.

> To get the cluster view, A trusted quota client process**will be
> spawned, on set of random 'n' bricks, containing only the cluster
> xlators, to aggregate the size on all the bricks of the volume. By
> querying for getxattrs on the directories, for a fixed time interval
> (say t secs), it updates the context of the quota xlator in server
> graph, by sending the setxattr with a key in dict. The t depends on
> lists, in the descending order for, 1. below soft limit 2. above soft
> limit; AND it is tunable.

Can you elaborate a bit on how this part is supposed to work?  What
we've talked about before (since CloudFS days) is that there would be a
"quota rebalancing daemon" that would observe when we're about to run
out of quota on one brick, and "borrow" quota from another brick, and so
on ad infinitum.  That sounds roughly like what you're suggesting,
except that there will be multiple such daemons active at once.  How do
they relate to one another?  Are they dividing the work among
themselves, using something like the same methods already in DHT and
proposed for parallel geo-replication?  What algorithms do they use to
decide when to intervene, and in what way?  A too-simple algorithm might
be prone to thrashing quota around as usage fluctuates, so we'll
probably need to build in some sort of damping function.