[Gluster-devel] Improvements in Quota Translator

Wed Apr 10 13:10:37 UTC 2013

On 04/10/2013 08:44 AM, Varun Shastry wrote:
> On Wednesday 10 April 2013 01:33 AM, Jeff Darcy wrote:
>> Not only should it be similar to XFS's quota, but we should
>> actually be able to have XFS do the enforcement if the user so
>> chooses.  Ditto for other local filesystems with similar-enough
>> quota functionality.  In those cases we'd be there only to help
>> manage the local FS.
> Since XFS doesn't allow hard links across directory tree quota 
> boundaries - we get EXDEV, it would prevent gluster from creation 
> ".glusterfs" directory entries. So Gluster quota does both accounting
>  and enforcing of quota.

Well, that's certainly an unfortunate consequence of using links instead
of a log to track dirty files.  :(  One more reason to undo that damage,
I suppose, but that's a different thread.

> What we're thinking:- Moving the quota from client to server loses
> its cluster view, so its just need know the cluster wide disk
> resource allocation for the directories on which limits are set. The
> gluster client process in the server side (trusted client) will
> periodically queries for the xattrs (quota sizes on all the bricks)
> and aggregates it. By sending the aggregated sizes (cluster wide
> consumption) through setxattrs, quota in the server graph gets the
> cluster-wide quota consumption. So, there by server quota xlator
> enforces the quota.

Any kind of server-side enforcement requires a local (per-brick) quota
to enforce.  In its simplest form, the per-brick quota is the same as
the global quota.  The problem with this approach is that a client can
go *way* over its quota before it's "caught" - with N bricks, it could
use up to N times its quota in the worst case.  To avoid that, we need
to be at least slightly more clever.  For example, we could say that the
quota on each brick at any point in time is:

	current_brick_usage + (unused_global_quota / num_bricks)

More sophisticated models are also possible, e.g. weighted assignment
for different-sized bricks or taking into account usage *trends* as well
as instantaneous usage.  Some of these, including the above example, can
be applied locally on each brick as long as they're given the few pieces
of information they need to do the calculation.  Others can't, or at
least not practically.  A design with calculations done on the bricks
might be sufficient in the near term, but to "future-proof" that design
I think the calculations should be moved to the trusted client(s) and
then communicated as "orders" to the bricks.