[Gluster-devel] [Gluster-users] Quota list not reflecting disk usage

Tue Feb 2 11:00:28 UTC 2016

Hi Steve,

As you have mentioned, if you are using a glusterfs version lesser than 3.7,
then you are doing it right. We are sorry to say but unfortunately that's the only
way(manually going and cleaning up the xattr's before enabling quota or wait for
the process to complete itself, which would take quite some time depending upon the
files) that can be done so as not to mess up quota enforcing/accounting. Also, we could
not find anything that could help us with the logs too. Thanks for the
point. We are in the process of writing blogs and documenting clearly about quota and
it's internal working. There is an initial blog[1] which we have written. More blogs will
follow.

With glusterfs-3.7, we have introduced something called "Quota versioning". 
So whenever you enable quota, we are suffixing a number(1..N) with the quota xattr's,
say you enable quota for the first time and the xattr will be like,
"trusted.glusterfs.quota.size.<suffix number from 1..N>". So all the quota related xattr's
will have the number suffixed to the xattr. With the versioning patch[2], when you disable and
enable quota again for the next time, it will be "trusted.glusterfs.quota.size.2"(Similarly
for other quota related xattr's). So quota accounting can happen independently depending on
the suffix and the cleanup process can go on independently which solves the issue that you
have.

[1] https://manikandanselvaganesh.wordpress.com/

[2] http://review.gluster.org/12386

--
Thanks & Regards,
Manikandan Selvaganesh.

----- Original Message -----
From: "Vijaikumar Mallikarjuna" <vmallika at redhat.com>
To: "Steve Dainard" <sdainard at spd1.com>
Cc: "Manikandan Selvaganesh" <mselvaga at redhat.com>
Sent: Tuesday, February 2, 2016 10:12:51 AM
Subject: Re: [Gluster-users] Quota list not reflecting disk usage

Hi Steve,

Sorry for the delay. Mani and myself was busy with something else at work,
we will update you on this by eod.

Many quota issues has been fixed in 3.7, also version numbers are added to
quota xattrs, so when quota is disabled we don't need to cleanup the xattrs.

Thanks,
Vijay

On Tue, Feb 2, 2016 at 12:26 AM, Steve Dainard <sdainard at spd1.com> wrote:

> I haven't heard anything back on this thread so here's where I've landed:
>
> It appears that the quota xattr's are not being cleared when quota's
> are disabled, so when they are disabled and re-enabled the value for
> size is added to the previous size, making it appear that the 'Used'
> space is significantly greater than it should be. This seems like a
> bug, but I don't know what to file it against, or if the logs I
> attached prove this.
>
> Also; the documentation doesn't make mention of how the quota system
> works, and what happens when quotas are enabled/disabled. There seems
> to be a background task for both settings:
> On enable: "/usr/bin/find . -exec /usr/bin/stat {} \ ;"
> On disable: setfattr is removing quota xattrs
>
> The thing is neither of these tasks are listed in 'gluster volume
> status <volume>' ie:
>
> Status of volume: storage
> Gluster process Port Online Pid
>
> ------------------------------------------------------------------------------
> Brick 10.0.231.50:/mnt/raid6-storage/storage 49156 Y 24899
> Brick 10.0.231.51:/mnt/raid6-storage/storage 49156 Y 2991
> Brick 10.0.231.52:/mnt/raid6-storage/storage 49156 Y 28853
> Brick 10.0.231.53:/mnt/raid6-storage/storage 49153 Y 2705
> NFS Server on localhost N/A N N/A
> Quota Daemon on localhost N/A Y 30066
> NFS Server on 10.0.231.52 N/A N N/A
> Quota Daemon on 10.0.231.52 N/A Y 24976
> NFS Server on 10.0.231.53 N/A N N/A
> Quota Daemon on 10.0.231.53 N/A Y 30334
> NFS Server on 10.0.231.51 N/A N N/A
> Quota Daemon on 10.0.231.51 N/A Y 15781
>
> Task Status of Volume storage
>
> ------------------------------------------------------------------------------
> ******There are no active volume tasks*******
>
> (I added the asterisks above)
> So without any visibility into these running tasks, or knowing of
> their existence (not documented) it becomes very difficult to know
> what's going on. On any reasonably large storage system these tasks
> take days to complete and there should be some indication of this.
>
> Where I'm at right now:
> - I disabled the quota's on volume 'storage'
> - I started to manually remove xattrs until I realized there is an
> automated task to do this.
> - After waiting for 'ps aux | grep setfattr' to return nothing, I
> re-enabled quotas
> - I'm currently waiting for the stat tasks to complete
> - Once the entire filesystem has been stat'ed, I'm going to set limits
> again.
>
> As a note, this is a pretty brutal process on a system with 140T of
> storage, and I can't imagine how much worse this would be if my nodes
> had more than 12 disks per, or if I was at PB scale.
>
> On Mon, Jan 25, 2016 at 12:31 PM, Steve Dainard <sdainard at spd1.com> wrote:
> > Here's a l link to a tarball of one of the gluster hosts logs:
> > https://dl.dropboxusercontent.com/u/21916057/gluster01.tar.gz
> >
> > I wanted to include past logs in case they were useful.
> >
> > Also, the volume I'm trying to get quota's working on is 'storage'
> > you'll notice I have a brick issue on a different volume 'vm-storage'.
> >
> > In regards to the 3.7 upgrade. I'm a bit hesitant to move to the
> > current release, I prefer to stay on a stable release with maintenance
> > updates if possible.
> >
> > On Mon, Jan 25, 2016 at 12:09 PM, Manikandan Selvaganesh
> > <mselvaga at redhat.com> wrote:
> >> Hi Steve,
> >>
> >> Also, do you have any plans to upgrade to the latest version. With 3.7,
> >> we have re factored some approaches used in quota and marker and that
> have
> >> fixed quite some issues.
> >>
> >> --
> >> Thanks & Regards,
> >> Manikandan Selvaganesh.
> >>
> >> ----- Original Message -----
> >> From: "Manikandan Selvaganesh" <mselvaga at redhat.com>
> >> To: "Steve Dainard" <sdainard at spd1.com>
> >> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> >> Sent: Tuesday, January 26, 2016 1:31:10 AM
> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
> >>
> >> Hi Steve,
> >>
> >> Could you send us the glusterfs logs, it could help us debug the issue!!
> >>
> >> --
> >> Thanks & Regards,
> >> Manikandan Selvaganesh.
> >>
> >> ----- Original Message -----
> >> From: "Steve Dainard" <sdainard at spd1.com>
> >> To: "Manikandan Selvaganesh" <mselvaga at redhat.com>
> >> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> >> Sent: Tuesday, January 26, 2016 12:56:22 AM
> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
> >>
> >> Something is seriously wrong with the quota output:
> >>
> >> # gluster volume quota storage list
> >>                   Path                   Hard-limit Soft-limit   Used
> >> Available  Soft-limit exceeded? Hard-limit exceeded?
> >>
> ---------------------------------------------------------------------------------------------------------------------------
> >> /projects-CanSISE                         10.0TB       80%      27.8TB
> >>  0Bytes             Yes                  Yes
> >> /data4/climate                           105.0TB       80%     307.1TB
> >>  0Bytes             Yes                  Yes
> >> /data4/forestry                           50.0GB       80%      61.9GB
> >>  0Bytes             Yes                  Yes
> >> /data4/projects                          800.0GB       80%       2.0TB
> >>  0Bytes             Yes                  Yes
> >> /data4/strays                             85.0GB       80%     230.5GB
> >>  0Bytes             Yes                  Yes
> >> /data4/gis                                 2.2TB       80%       6.3TB
> >>  0Bytes             Yes                  Yes
> >> /data4/modperl                             1.0TB       80%     953.2GB
> >>  70.8GB             Yes                   No
> >> /data4/dem                                 1.0GB       80%      0Bytes
> >>   1.0GB              No                   No
> >> /projects-hydrology-archive0               5.0TB       80%      14.4TB
> >>  0Bytes             Yes                  Yes
> >> /climate-downscale-idf-ec                  7.5TB       80%       5.1TB
> >>   2.4TB              No                   No
> >> /climate-downscale-idf                     5.0TB       80%       6.1TB
> >>  0Bytes             Yes                  Yes
> >> /home                                      5.0TB       80%      11.8TB
> >>  0Bytes             Yes                  Yes
> >> /projects-hydrology-scratch0               7.0TB       80%     169.1GB
> >>   6.8TB              No                   No
> >> /projects-rci-scratch                     10.0TB       80%       1.9TB
> >>   8.1TB              No                   No
> >> /projects-dataportal                       1.0TB       80%     775.4GB
> >> 248.6GB              No                   No
> >> /modules                                   1.0TB       80%      36.1GB
> >> 987.9GB              No                   No
> >> /data4/climate/downscale/CMIP5            65.0TB       80%      56.4TB
> >>   8.6TB             Yes                   No
> >>
> >> Gluster is listing 'Used' space of over 307TB on /data4/climate, but
> >> the volume capacity is only 146T.
> >>
> >> This has happened after disabling quotas on the volume, re-enabling
> >> quotas, and then setting quotas again. There was a lot of glusterfsd
> >> CPU usage afterwards, and now 3 days later the quota's I set were all
> >> missing except
> >>
> >> /data4/projects|800.0GB|2.0TB|0Bytes
> >>
> >> So I re-set the quotas and the output above is what I have.
> >>
> >> Previous to disabling quota's this was the output:
> >> # gluster volume quota storage list
> >>                   Path                   Hard-limit Soft-limit   Used
> >> Available  Soft-limit exceeded? Hard-limit exceeded?
> >>
> ---------------------------------------------------------------------------------------------------------------------------
> >> /data4/climate                           105.0TB       80%     151.6TB
> >>  0Bytes             Yes                  Yes
> >> /data4/forestry                           50.0GB       80%      45.4GB
> >>   4.6GB             Yes                   No
> >> /data4/projects                          800.0GB       80%     753.1GB
> >>  46.9GB             Yes                   No
> >> /data4/strays                             85.0GB       80%      80.8GB
> >>   4.2GB             Yes                   No
> >> /data4/gis                                 2.2TB       80%       2.1TB
> >>  91.8GB             Yes                   No
> >> /data4/modperl                             1.0TB       80%     948.1GB
> >>  75.9GB             Yes                   No
> >> /data4/dem                                 1.0GB       80%      0Bytes
> >>   1.0GB              No                   No
> >> /projects-CanSISE                         10.0TB       80%      11.9TB
> >>  0Bytes             Yes                  Yes
> >> /projects-hydrology-archive0               5.0TB       80%       4.8TB
> >> 174.0GB             Yes                   No
> >> /climate-downscale-idf-ec                  7.5TB       80%       5.0TB
> >>   2.5TB              No                   No
> >> /climate-downscale-idf                     5.0TB       80%       3.8TB
> >>   1.2TB              No                   No
> >> /home                                      5.0TB       80%       4.7TB
> >> 283.8GB             Yes                   No
> >> /projects-hydrology-scratch0               7.0TB       80%      95.9GB
> >>   6.9TB              No                   No
> >> /projects-rci-scratch                     10.0TB       80%       1.7TB
> >>   8.3TB              No                   No
> >> /projects-dataportal                       1.0TB       80%     775.4GB
> >> 248.6GB              No                   No
> >> /modules                                   1.0TB       80%      14.6GB
> >> 1009.4GB              No                   No
> >> /data4/climate/downscale/CMIP5            65.0TB       80%      56.4TB
> >>   8.6TB             Yes                   No
> >>
> >> I was so focused on the /projects-CanSISE quota not being accurate
> >> that I missed that the 'Used' space on /data4/climate is listed higher
> >> then the total gluster volume capacity.
> >>
> >> On Mon, Jan 25, 2016 at 10:52 AM, Steve Dainard <sdainard at spd1.com>
> wrote:
> >>> Hi Manikandan
> >>>
> >>> I'm using 'du' not df in this case.
> >>>
> >>> On Thu, Jan 21, 2016 at 9:20 PM, Manikandan Selvaganesh
> >>> <mselvaga at redhat.com> wrote:
> >>>> Hi Steve,
> >>>>
> >>>> If you would like disk usage using df utility by taking quota limits
> into
> >>>> consideration, then you are expected to run the following command.
> >>>>
> >>>>    'gluster volume set VOLNAME quota-deem-statfs on'
> >>>>
> >>>> with older versions where quota-deem-statfs is OFF by default.
> However with
> >>>> the latest versions, quota-deem-statfs is by default ON. In this
> case, the total
> >>>> disk space of the directory is taken as the quota hard limit set on
> the directory
> >>>> of the volume and disk utility would display accordingly. This
> answers why there is
> >>>> a mismatch in disk utility.
> >>>>
> >>>> Next, answering to quota mechanism and accuracy: There is something
> called timeouts
> >>>> in quota. For performance reasons, quota caches the directory size on
> client. You can
> >>>> set timeout indicating the maximum valid duration of directory sizes
> in cache,
> >>>> from the time they are populated. By default the hard-timeout is 5s
> and soft timeout
> >>>> is 60s. Setting a timeout of zero will do a force fetching of
> directory sizes from server
> >>>> for every operation that modifies file data and will effectively
> disables directory size
> >>>> caching on client side. If you do not have a timeout of 0(which we do
> not encourage due to
> >>>> performance reasons), then till you reach soft-limit, soft timeout
> will be taken into
> >>>> consideration, and only for every 60s operations will be synced and
> that could cause the
> >>>> usage to exceed more than the hard-limit specified. If you would like
> quota to
> >>>> strictly enforce then please run the following commands,
> >>>>
> >>>>     'gluster v quota VOLNAME hard-timeout 0s'
> >>>>     'gluster v quota VOLNAME soft-timeout 0s'
> >>>>
> >>>> Appreciate your curiosity in exploring and if you would like to know
> more about quota
> >>>> please refer[1]
> >>>>
> >>>> [1]
> http://gluster.readthedocs.org/en/release-3.7.0-1/Administrator%20Guide/Directory%20Quota/
> >>>>
> >>>> --
> >>>> Thanks & Regards,
> >>>> Manikandan Selvaganesh.
> >>>>
> >>>> ----- Original Message -----
> >>>> From: "Steve Dainard" <sdainard at spd1.com>
> >>>> To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> >>>> Sent: Friday, January 22, 2016 1:40:07 AM
> >>>> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
> >>>>
> >>>> This is gluster 3.6.6.
> >>>>
> >>>> I've attempted to disable and re-enable quota's on the volume, but
> >>>> when I re-apply the quotas on each directory the same 'Used' value is
> >>>> present as before.
> >>>>
> >>>> Where is quotad getting its information from, and how can I clean
> >>>> up/regenerate that info?
> >>>>
> >>>> On Thu, Jan 21, 2016 at 10:07 AM, Steve Dainard <sdainard at spd1.com>
> wrote:
> >>>>> I have a distributed volume with quota's enabled:
> >>>>>
> >>>>> Volume Name: storage
> >>>>> Type: Distribute
> >>>>> Volume ID: 26d355cb-c486-481f-ac16-e25390e73775
> >>>>> Status: Started
> >>>>> Number of Bricks: 4
> >>>>> Transport-type: tcp
> >>>>> Bricks:
> >>>>> Brick1: 10.0.231.50:/mnt/raid6-storage/storage
> >>>>> Brick2: 10.0.231.51:/mnt/raid6-storage/storage
> >>>>> Brick3: 10.0.231.52:/mnt/raid6-storage/storage
> >>>>> Brick4: 10.0.231.53:/mnt/raid6-storage/storage
> >>>>> Options Reconfigured:
> >>>>> performance.cache-size: 1GB
> >>>>> performance.readdir-ahead: on
> >>>>> features.quota: on
> >>>>> diagnostics.brick-log-level: WARNING
> >>>>>
> >>>>> Here is a partial list of quotas:
> >>>>> # /usr/sbin/gluster volume quota storage list
> >>>>>                   Path                   Hard-limit Soft-limit   Used
> >>>>> Available  Soft-limit exceeded? Hard-limit exceeded?
> >>>>>
> ---------------------------------------------------------------------------------------------------------------------------
> >>>>> ...
> >>>>> /projects-CanSISE                         10.0TB       80%
> 11.9TB
> >>>>>  0Bytes             Yes                  Yes
> >>>>> ...
> >>>>>
> >>>>> If I du on that location I do not get 11.9TB of space used (fuse
> mount point):
> >>>>> [root at storage projects-CanSISE]# du -hs
> >>>>> 9.5T .
> >>>>>
> >>>>> Can someone provide an explanation for how the quota mechanism tracks
> >>>>> disk usage? How often does the quota mechanism check its accuracy?
> And
> >>>>> how could it get so far off?
> >>>>>
> >>>>> Can I get gluster to rescan that location and update the quota usage?
> >>>>>
> >>>>> Thanks,
> >>>>> Steve
> >>>> _______________________________________________
> >>>> Gluster-users mailing list
> >>>> Gluster-users at gluster.org
> >>>> http://www.gluster.org/mailman/listinfo/gluster-users
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>