[Gluster-users] Quota list not reflecting disk usage
Steve Dainard
sdainard at spd1.com
Wed Feb 10 20:18:19 UTC 2016
So after waiting out the process of disabling quotas, waiting for the
xattrs to be cleaned up, re-enabling quotas and waiting for the
xattr's to be created, then applying quotas I'm running into the same
issue.
Yesterday at ~2pm one of the quotas was listed as:
/modules|100.0GB|18.3GB|81.7GB
I initiated a copy from that glusterfs fuse mount to another fuse
mount for a different volume, and now I'm seeing:
/modules|100.0GB|27.4GB|72.6GB
So an increase of 9GB usage.
There were no writes at all to this directory during or after the cp.
I did a bit of digging through the /modules directory on one of the
gluster nodes and created this spreadsheet:
https://docs.google.com/spreadsheets/d/1l_6ze68TCOcx6LEh9MFwmqPZ9bM-70CUlSM_8tpQ654/edit?usp=sharing
The /modules/R/3.2.2 directory quota value doesn't come close to
matching the du value.
Funny bit, there are TWO quota contribution attributes:
# getfattr -d -m quota -e hex 3.2.2
# file: 3.2.2
trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000
trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000013fda800
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000013fda800
For reference, another directory /modules/R/2.14.2 has only one
contribution attribute:
# getfattr -d -m quota -e hex 2.14.2
# file: 2.14.2
trusted.glusterfs.quota.c890be20-1bb9-4aec-a8d0-eacab0446f16.contri=0x0000000000692800
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000692800
Questions:
1. Why wasn't the
trusted.glusterfs.quota.242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3.contri=0x0000000009af6000
cleaned up?
2A. How can I remove old attributes from the fs, and then force a
re-calculation of contributions for the quota path /modules once I've
done this on all gluster nodes?
2B. Or am I stuck yet again removing quotas completely, waiting for
the automated setfattr to remove the quotas for
c890be20-1bb9-4aec-a8d0-eacab0446f16 ID, manually removing attrs for
242dcfd9-6aea-4cb8-beb2-c0ed91ad70d3, re-enabling quotas, waiting for
xattrs to be generated, then enabling limits?
3. Shouldn't there be a command to re-trigger quota accounting on a
directory that confirms the attrs are set correctly and checks that
the contribution attr actually match disk usage?
On Tue, Feb 2, 2016 at 3:00 AM, Manikandan Selvaganesh
<mselvaga at redhat.com> wrote:
> Hi Steve,
>
> As you have mentioned, if you are using a glusterfs version lesser than 3.7,
> then you are doing it right. We are sorry to say but unfortunately that's the only
> way(manually going and cleaning up the xattr's before enabling quota or wait for
> the process to complete itself, which would take quite some time depending upon the
> files) that can be done so as not to mess up quota enforcing/accounting. Also, we could
> not find anything that could help us with the logs too. Thanks for the
> point. We are in the process of writing blogs and documenting clearly about quota and
> it's internal working. There is an initial blog[1] which we have written. More blogs will
> follow.
>
> With glusterfs-3.7, we have introduced something called "Quota versioning".
> So whenever you enable quota, we are suffixing a number(1..N) with the quota xattr's,
> say you enable quota for the first time and the xattr will be like,
> "trusted.glusterfs.quota.size.<suffix number from 1..N>". So all the quota related xattr's
> will have the number suffixed to the xattr. With the versioning patch[2], when you disable and
> enable quota again for the next time, it will be "trusted.glusterfs.quota.size.2"(Similarly
> for other quota related xattr's). So quota accounting can happen independently depending on
> the suffix and the cleanup process can go on independently which solves the issue that you
> have.
>
> [1] https://manikandanselvaganesh.wordpress.com/
>
> [2] http://review.gluster.org/12386
>
> --
> Thanks & Regards,
> Manikandan Selvaganesh.
>
> ----- Original Message -----
> From: "Vijaikumar Mallikarjuna" <vmallika at redhat.com>
> To: "Steve Dainard" <sdainard at spd1.com>
> Cc: "Manikandan Selvaganesh" <mselvaga at redhat.com>
> Sent: Tuesday, February 2, 2016 10:12:51 AM
> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
>
> Hi Steve,
>
> Sorry for the delay. Mani and myself was busy with something else at work,
> we will update you on this by eod.
>
> Many quota issues has been fixed in 3.7, also version numbers are added to
> quota xattrs, so when quota is disabled we don't need to cleanup the xattrs.
>
> Thanks,
> Vijay
>
>
>
>
>
> On Tue, Feb 2, 2016 at 12:26 AM, Steve Dainard <sdainard at spd1.com> wrote:
>
>> I haven't heard anything back on this thread so here's where I've landed:
>>
>> It appears that the quota xattr's are not being cleared when quota's
>> are disabled, so when they are disabled and re-enabled the value for
>> size is added to the previous size, making it appear that the 'Used'
>> space is significantly greater than it should be. This seems like a
>> bug, but I don't know what to file it against, or if the logs I
>> attached prove this.
>>
>> Also; the documentation doesn't make mention of how the quota system
>> works, and what happens when quotas are enabled/disabled. There seems
>> to be a background task for both settings:
>> On enable: "/usr/bin/find . -exec /usr/bin/stat {} \ ;"
>> On disable: setfattr is removing quota xattrs
>>
>> The thing is neither of these tasks are listed in 'gluster volume
>> status <volume>' ie:
>>
>> Status of volume: storage
>> Gluster process Port Online Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 10.0.231.50:/mnt/raid6-storage/storage 49156 Y 24899
>> Brick 10.0.231.51:/mnt/raid6-storage/storage 49156 Y 2991
>> Brick 10.0.231.52:/mnt/raid6-storage/storage 49156 Y 28853
>> Brick 10.0.231.53:/mnt/raid6-storage/storage 49153 Y 2705
>> NFS Server on localhost N/A N N/A
>> Quota Daemon on localhost N/A Y 30066
>> NFS Server on 10.0.231.52 N/A N N/A
>> Quota Daemon on 10.0.231.52 N/A Y 24976
>> NFS Server on 10.0.231.53 N/A N N/A
>> Quota Daemon on 10.0.231.53 N/A Y 30334
>> NFS Server on 10.0.231.51 N/A N N/A
>> Quota Daemon on 10.0.231.51 N/A Y 15781
>>
>> Task Status of Volume storage
>>
>> ------------------------------------------------------------------------------
>> ******There are no active volume tasks*******
>>
>> (I added the asterisks above)
>> So without any visibility into these running tasks, or knowing of
>> their existence (not documented) it becomes very difficult to know
>> what's going on. On any reasonably large storage system these tasks
>> take days to complete and there should be some indication of this.
>>
>> Where I'm at right now:
>> - I disabled the quota's on volume 'storage'
>> - I started to manually remove xattrs until I realized there is an
>> automated task to do this.
>> - After waiting for 'ps aux | grep setfattr' to return nothing, I
>> re-enabled quotas
>> - I'm currently waiting for the stat tasks to complete
>> - Once the entire filesystem has been stat'ed, I'm going to set limits
>> again.
>>
>> As a note, this is a pretty brutal process on a system with 140T of
>> storage, and I can't imagine how much worse this would be if my nodes
>> had more than 12 disks per, or if I was at PB scale.
>>
>> On Mon, Jan 25, 2016 at 12:31 PM, Steve Dainard <sdainard at spd1.com> wrote:
>> > Here's a l link to a tarball of one of the gluster hosts logs:
>> > https://dl.dropboxusercontent.com/u/21916057/gluster01.tar.gz
>> >
>> > I wanted to include past logs in case they were useful.
>> >
>> > Also, the volume I'm trying to get quota's working on is 'storage'
>> > you'll notice I have a brick issue on a different volume 'vm-storage'.
>> >
>> > In regards to the 3.7 upgrade. I'm a bit hesitant to move to the
>> > current release, I prefer to stay on a stable release with maintenance
>> > updates if possible.
>> >
>> > On Mon, Jan 25, 2016 at 12:09 PM, Manikandan Selvaganesh
>> > <mselvaga at redhat.com> wrote:
>> >> Hi Steve,
>> >>
>> >> Also, do you have any plans to upgrade to the latest version. With 3.7,
>> >> we have re factored some approaches used in quota and marker and that
>> have
>> >> fixed quite some issues.
>> >>
>> >> --
>> >> Thanks & Regards,
>> >> Manikandan Selvaganesh.
>> >>
>> >> ----- Original Message -----
>> >> From: "Manikandan Selvaganesh" <mselvaga at redhat.com>
>> >> To: "Steve Dainard" <sdainard at spd1.com>
>> >> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>> >> Sent: Tuesday, January 26, 2016 1:31:10 AM
>> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
>> >>
>> >> Hi Steve,
>> >>
>> >> Could you send us the glusterfs logs, it could help us debug the issue!!
>> >>
>> >> --
>> >> Thanks & Regards,
>> >> Manikandan Selvaganesh.
>> >>
>> >> ----- Original Message -----
>> >> From: "Steve Dainard" <sdainard at spd1.com>
>> >> To: "Manikandan Selvaganesh" <mselvaga at redhat.com>
>> >> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>> >> Sent: Tuesday, January 26, 2016 12:56:22 AM
>> >> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
>> >>
>> >> Something is seriously wrong with the quota output:
>> >>
>> >> # gluster volume quota storage list
>> >> Path Hard-limit Soft-limit Used
>> >> Available Soft-limit exceeded? Hard-limit exceeded?
>> >>
>> ---------------------------------------------------------------------------------------------------------------------------
>> >> /projects-CanSISE 10.0TB 80% 27.8TB
>> >> 0Bytes Yes Yes
>> >> /data4/climate 105.0TB 80% 307.1TB
>> >> 0Bytes Yes Yes
>> >> /data4/forestry 50.0GB 80% 61.9GB
>> >> 0Bytes Yes Yes
>> >> /data4/projects 800.0GB 80% 2.0TB
>> >> 0Bytes Yes Yes
>> >> /data4/strays 85.0GB 80% 230.5GB
>> >> 0Bytes Yes Yes
>> >> /data4/gis 2.2TB 80% 6.3TB
>> >> 0Bytes Yes Yes
>> >> /data4/modperl 1.0TB 80% 953.2GB
>> >> 70.8GB Yes No
>> >> /data4/dem 1.0GB 80% 0Bytes
>> >> 1.0GB No No
>> >> /projects-hydrology-archive0 5.0TB 80% 14.4TB
>> >> 0Bytes Yes Yes
>> >> /climate-downscale-idf-ec 7.5TB 80% 5.1TB
>> >> 2.4TB No No
>> >> /climate-downscale-idf 5.0TB 80% 6.1TB
>> >> 0Bytes Yes Yes
>> >> /home 5.0TB 80% 11.8TB
>> >> 0Bytes Yes Yes
>> >> /projects-hydrology-scratch0 7.0TB 80% 169.1GB
>> >> 6.8TB No No
>> >> /projects-rci-scratch 10.0TB 80% 1.9TB
>> >> 8.1TB No No
>> >> /projects-dataportal 1.0TB 80% 775.4GB
>> >> 248.6GB No No
>> >> /modules 1.0TB 80% 36.1GB
>> >> 987.9GB No No
>> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB
>> >> 8.6TB Yes No
>> >>
>> >> Gluster is listing 'Used' space of over 307TB on /data4/climate, but
>> >> the volume capacity is only 146T.
>> >>
>> >> This has happened after disabling quotas on the volume, re-enabling
>> >> quotas, and then setting quotas again. There was a lot of glusterfsd
>> >> CPU usage afterwards, and now 3 days later the quota's I set were all
>> >> missing except
>> >>
>> >> /data4/projects|800.0GB|2.0TB|0Bytes
>> >>
>> >> So I re-set the quotas and the output above is what I have.
>> >>
>> >> Previous to disabling quota's this was the output:
>> >> # gluster volume quota storage list
>> >> Path Hard-limit Soft-limit Used
>> >> Available Soft-limit exceeded? Hard-limit exceeded?
>> >>
>> ---------------------------------------------------------------------------------------------------------------------------
>> >> /data4/climate 105.0TB 80% 151.6TB
>> >> 0Bytes Yes Yes
>> >> /data4/forestry 50.0GB 80% 45.4GB
>> >> 4.6GB Yes No
>> >> /data4/projects 800.0GB 80% 753.1GB
>> >> 46.9GB Yes No
>> >> /data4/strays 85.0GB 80% 80.8GB
>> >> 4.2GB Yes No
>> >> /data4/gis 2.2TB 80% 2.1TB
>> >> 91.8GB Yes No
>> >> /data4/modperl 1.0TB 80% 948.1GB
>> >> 75.9GB Yes No
>> >> /data4/dem 1.0GB 80% 0Bytes
>> >> 1.0GB No No
>> >> /projects-CanSISE 10.0TB 80% 11.9TB
>> >> 0Bytes Yes Yes
>> >> /projects-hydrology-archive0 5.0TB 80% 4.8TB
>> >> 174.0GB Yes No
>> >> /climate-downscale-idf-ec 7.5TB 80% 5.0TB
>> >> 2.5TB No No
>> >> /climate-downscale-idf 5.0TB 80% 3.8TB
>> >> 1.2TB No No
>> >> /home 5.0TB 80% 4.7TB
>> >> 283.8GB Yes No
>> >> /projects-hydrology-scratch0 7.0TB 80% 95.9GB
>> >> 6.9TB No No
>> >> /projects-rci-scratch 10.0TB 80% 1.7TB
>> >> 8.3TB No No
>> >> /projects-dataportal 1.0TB 80% 775.4GB
>> >> 248.6GB No No
>> >> /modules 1.0TB 80% 14.6GB
>> >> 1009.4GB No No
>> >> /data4/climate/downscale/CMIP5 65.0TB 80% 56.4TB
>> >> 8.6TB Yes No
>> >>
>> >> I was so focused on the /projects-CanSISE quota not being accurate
>> >> that I missed that the 'Used' space on /data4/climate is listed higher
>> >> then the total gluster volume capacity.
>> >>
>> >> On Mon, Jan 25, 2016 at 10:52 AM, Steve Dainard <sdainard at spd1.com>
>> wrote:
>> >>> Hi Manikandan
>> >>>
>> >>> I'm using 'du' not df in this case.
>> >>>
>> >>> On Thu, Jan 21, 2016 at 9:20 PM, Manikandan Selvaganesh
>> >>> <mselvaga at redhat.com> wrote:
>> >>>> Hi Steve,
>> >>>>
>> >>>> If you would like disk usage using df utility by taking quota limits
>> into
>> >>>> consideration, then you are expected to run the following command.
>> >>>>
>> >>>> 'gluster volume set VOLNAME quota-deem-statfs on'
>> >>>>
>> >>>> with older versions where quota-deem-statfs is OFF by default.
>> However with
>> >>>> the latest versions, quota-deem-statfs is by default ON. In this
>> case, the total
>> >>>> disk space of the directory is taken as the quota hard limit set on
>> the directory
>> >>>> of the volume and disk utility would display accordingly. This
>> answers why there is
>> >>>> a mismatch in disk utility.
>> >>>>
>> >>>> Next, answering to quota mechanism and accuracy: There is something
>> called timeouts
>> >>>> in quota. For performance reasons, quota caches the directory size on
>> client. You can
>> >>>> set timeout indicating the maximum valid duration of directory sizes
>> in cache,
>> >>>> from the time they are populated. By default the hard-timeout is 5s
>> and soft timeout
>> >>>> is 60s. Setting a timeout of zero will do a force fetching of
>> directory sizes from server
>> >>>> for every operation that modifies file data and will effectively
>> disables directory size
>> >>>> caching on client side. If you do not have a timeout of 0(which we do
>> not encourage due to
>> >>>> performance reasons), then till you reach soft-limit, soft timeout
>> will be taken into
>> >>>> consideration, and only for every 60s operations will be synced and
>> that could cause the
>> >>>> usage to exceed more than the hard-limit specified. If you would like
>> quota to
>> >>>> strictly enforce then please run the following commands,
>> >>>>
>> >>>> 'gluster v quota VOLNAME hard-timeout 0s'
>> >>>> 'gluster v quota VOLNAME soft-timeout 0s'
>> >>>>
>> >>>> Appreciate your curiosity in exploring and if you would like to know
>> more about quota
>> >>>> please refer[1]
>> >>>>
>> >>>> [1]
>> http://gluster.readthedocs.org/en/release-3.7.0-1/Administrator%20Guide/Directory%20Quota/
>> >>>>
>> >>>> --
>> >>>> Thanks & Regards,
>> >>>> Manikandan Selvaganesh.
>> >>>>
>> >>>> ----- Original Message -----
>> >>>> From: "Steve Dainard" <sdainard at spd1.com>
>> >>>> To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>> >>>> Sent: Friday, January 22, 2016 1:40:07 AM
>> >>>> Subject: Re: [Gluster-users] Quota list not reflecting disk usage
>> >>>>
>> >>>> This is gluster 3.6.6.
>> >>>>
>> >>>> I've attempted to disable and re-enable quota's on the volume, but
>> >>>> when I re-apply the quotas on each directory the same 'Used' value is
>> >>>> present as before.
>> >>>>
>> >>>> Where is quotad getting its information from, and how can I clean
>> >>>> up/regenerate that info?
>> >>>>
>> >>>> On Thu, Jan 21, 2016 at 10:07 AM, Steve Dainard <sdainard at spd1.com>
>> wrote:
>> >>>>> I have a distributed volume with quota's enabled:
>> >>>>>
>> >>>>> Volume Name: storage
>> >>>>> Type: Distribute
>> >>>>> Volume ID: 26d355cb-c486-481f-ac16-e25390e73775
>> >>>>> Status: Started
>> >>>>> Number of Bricks: 4
>> >>>>> Transport-type: tcp
>> >>>>> Bricks:
>> >>>>> Brick1: 10.0.231.50:/mnt/raid6-storage/storage
>> >>>>> Brick2: 10.0.231.51:/mnt/raid6-storage/storage
>> >>>>> Brick3: 10.0.231.52:/mnt/raid6-storage/storage
>> >>>>> Brick4: 10.0.231.53:/mnt/raid6-storage/storage
>> >>>>> Options Reconfigured:
>> >>>>> performance.cache-size: 1GB
>> >>>>> performance.readdir-ahead: on
>> >>>>> features.quota: on
>> >>>>> diagnostics.brick-log-level: WARNING
>> >>>>>
>> >>>>> Here is a partial list of quotas:
>> >>>>> # /usr/sbin/gluster volume quota storage list
>> >>>>> Path Hard-limit Soft-limit Used
>> >>>>> Available Soft-limit exceeded? Hard-limit exceeded?
>> >>>>>
>> ---------------------------------------------------------------------------------------------------------------------------
>> >>>>> ...
>> >>>>> /projects-CanSISE 10.0TB 80%
>> 11.9TB
>> >>>>> 0Bytes Yes Yes
>> >>>>> ...
>> >>>>>
>> >>>>> If I du on that location I do not get 11.9TB of space used (fuse
>> mount point):
>> >>>>> [root at storage projects-CanSISE]# du -hs
>> >>>>> 9.5T .
>> >>>>>
>> >>>>> Can someone provide an explanation for how the quota mechanism tracks
>> >>>>> disk usage? How often does the quota mechanism check its accuracy?
>> And
>> >>>>> how could it get so far off?
>> >>>>>
>> >>>>> Can I get gluster to rescan that location and update the quota usage?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Steve
>> >>>> _______________________________________________
>> >>>> Gluster-users mailing list
>> >>>> Gluster-users at gluster.org
>> >>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
More information about the Gluster-users
mailing list