[Gluster-devel] Improvements in Quota Translator

Wed Apr 10 12:44:16 UTC 2013

Hi Jeff

On Wednesday 10 April 2013 01:33 AM, Jeff Darcy wrote:
> On 04/09/2013 09:39 AM, Varun Shastry wrote:
>> Hi Everyone,
>>
>> As gluster quota was facing some issues in its functionality, its
>> required to make it fool-proof, robust and reliable. So, below are
>> the some of the major problems we are facing and the modifications
>> to overcome the same.
>>
>> Current implementation * Client side implementation of quota - Not
>> secure - Increased traffic in updating the ctx - Relying on xattrs
>> updation through lookup calls * Problem with NFS mount - lack of
>> lookups (handling through 'file handles')
>>
>> So, the new design is proposed,
>>
>> * Two level of quota implementation soft and hard quota, similar to
>> the XFS's quota, is introduced. A message is logged on reaching soft
>> quota and no more writes allowed after hard limit.
> Not only should it be similar to XFS's quota, but we should actually be
> able to have XFS do the enforcement if the user so chooses.  Ditto for
> other local filesystems with similar-enough quota functionality.  In
> those cases we'd be there only to help manage the local FS.
Since XFS doesn't allow hard links across directory tree quota 
boundaries - we get EXDEV, it would prevent gluster from creation 
".glusterfs" directory entries. So Gluster quota does both accounting 
and enforcing of quota.
>
>> * Quota is moved to server-side. Server side implementation removes
>> the client dependability for specific calls and secures the quota
>> from mounting with modified volfile.
> Absolutely agree that this is required.
>
>> To get the cluster view, A trusted quota client process**will be
>> spawned, on set of random 'n' bricks, containing only the cluster
>> xlators, to aggregate the size on all the bricks of the volume. By
>> querying for getxattrs on the directories, for a fixed time interval
>> (say t secs), it updates the context of the quota xlator in server
>> graph, by sending the setxattr with a key in dict. The t depends on
>> lists, in the descending order for, 1. below soft limit 2. above soft
>> limit; AND it is tunable.
> Can you elaborate a bit on how this part is supposed to work?  What
> we've talked about before (since CloudFS days) is that there would be a
> "quota rebalancing daemon" that would observe when we're about to run
> out of quota on one brick, and "borrow" quota from another brick, and so
> on ad infinitum.  That sounds roughly like what you're suggesting,
> except that there will be multiple such daemons active at once.  How do
> they relate to one another?  Are they dividing the work among
> themselves, using something like the same methods already in DHT and
> proposed for parallel geo-replication?  What algorithms do they use to
> decide when to intervene, and in what way?  A too-simple algorithm might
> be prone to thrashing quota around as usage fluctuates, so we'll
> probably need to build in some sort of damping function.
>
As you explained above, no, its not the same approach. We don't assign 
'the' size to bricks and change it when one of them reaches its limit.

What we're thinking:-
Moving the quota from client to server loses its cluster view, so its 
just need know the cluster wide disk resource allocation for the 
directories on which limits are set. The gluster client process in the 
server side (trusted client) will periodically queries for the xattrs 
(quota sizes on all the bricks) and aggregates it. By sending the 
aggregated sizes (cluster wide consumption) through setxattrs, quota in 
the server graph gets the cluster-wide quota consumption. So, there by 
server quota xlator enforces the quota.

- Varun Shastry