[Gluster-users] Wrong directory quota usage

João Baúto joao.bauto at neuro.fchampalimaud.org
Thu Aug 20 17:19:25 UTC 2020


Hi Srijan,

After a 3rd run of the quota_fsck script, the quotas got fixed! Working
normally again.

Thank you for your help!
*João Baúto*
---------------

*Scientific Computing and Software Platform*
Champalimaud Research
Champalimaud Center for the Unknown
Av. Brasília, Doca de Pedrouços
1400-038 Lisbon, Portugal
fchampalimaud.org <https://www.fchampalimaud.org/>


Srijan Sivakumar <ssivakum at redhat.com> escreveu no dia quarta, 19/08/2020
à(s) 18:04:

> Hi João,
>
> I'd recommend to go with the disable/enable of the quota as that'd
> eventually do the same thing. Rather than manually changing the parameters
> in the said command, that would be the better option.
>
> --
> Thanks and Regards,
>
> SRIJAN SIVAKUMAR
>
> Associate Software Engineer
>
> Red Hat
> <https://www.redhat.com>
>
>
> <https://www.redhat.com>
>
> T: +91-9727532362 <http://redhatemailsignature-marketing.itos.redhat.com/>
>
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>
> On Wed, Aug 19, 2020 at 8:12 PM João Baúto <
> joao.bauto at neuro.fchampalimaud.org> wrote:
>
>> Hi Srijan,
>>
>> Before I do the disable/enable just want to check something with you. The
>> other cluster where the crawling is running, I can see the find command and
>> this one which seems to be the one triggering the crawler (4 processes, one
>> per brick in all nodes)
>>
>> /usr/sbin/glusterfs -s localhost --volfile-id
>> client_per_brick/tank.client.hostname.tank-volume1-brick.vol
>> --use-readdirp=yes --client-pid -100 -l
>> /var/log/glusterfs/quota_crawl/tank-volume1-brick.log
>> /var/run/gluster/tmp/mntYbIVwT
>>
>> Can I manually trigger this command?
>>
>> Thanks!
>> *João Baúto*
>> ---------------
>>
>> *Scientific Computing and Software Platform*
>> Champalimaud Research
>> Champalimaud Center for the Unknown
>> Av. Brasília, Doca de Pedrouços
>> 1400-038 Lisbon, Portugal
>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>
>>
>> Srijan Sivakumar <ssivakum at redhat.com> escreveu no dia quarta,
>> 19/08/2020 à(s) 07:25:
>>
>>> Hi João,
>>>
>>> If the crawl is not going on and the values are still not reflecting
>>> properly then it means the crawl process has ended abruptly.
>>>
>>> Yes, technically disabling and enabling the quota will trigger crawl but
>>> it'd do a complete crawl of the filesystem, hence would take time and be
>>> resource consuming. Usually disabling-enabling is the last thing to do if
>>> the accounting isn't reflecting properly but if you're going to merge these
>>> two clusters then probably you can go ahead with the merging and then
>>> enable quota.
>>>
>>> --
>>> Thanks and Regards,
>>>
>>> SRIJAN SIVAKUMAR
>>>
>>> Associate Software Engineer
>>>
>>> Red Hat
>>> <https://www.redhat.com>
>>>
>>>
>>> <https://www.redhat.com>
>>>
>>> T: +91-9727532362
>>> <http://redhatemailsignature-marketing.itos.redhat.com/>
>>> <https://red.ht/sig>
>>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>>
>>> On Wed, Aug 19, 2020 at 3:53 AM João Baúto <
>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>
>>>> Hi Srijan,
>>>>
>>>> I didn't get any result with that command so I went to our other
>>>> cluster (we are merging two clusters, data is replicated) and activated the
>>>> quota feature on the same directory. Running the same command on each node
>>>> I get a similar output to yours. One process per brick I'm assuming.
>>>>
>>>> root     1746822  1.4  0.0 230324  2992 ?        S    23:06   0:04
>>>> /usr/bin/find . -exec /usr/bin/stat {} \ ;
>>>> root     1746858  5.3  0.0 233924  6644 ?        S    23:06   0:15
>>>> /usr/bin/find . -exec /usr/bin/stat {} \ ;
>>>> root     1746889  3.3  0.0 233592  6452 ?        S    23:06   0:10
>>>> /usr/bin/find . -exec /usr/bin/stat {} \ ;
>>>> root     1746930  3.1  0.0 230476  3232 ?        S    23:06   0:09
>>>> /usr/bin/find . -exec /usr/bin/stat {} \ ;
>>>>
>>>> At this point, is it easier to just disable and enable the feature and
>>>> force a new crawl? We don't mind a temporary increase in CPU and IO usage.
>>>>
>>>> Thank you again!
>>>> *João Baúto*
>>>> ---------------
>>>>
>>>> *Scientific Computing and Software Platform*
>>>> Champalimaud Research
>>>> Champalimaud Center for the Unknown
>>>> Av. Brasília, Doca de Pedrouços
>>>> 1400-038 Lisbon, Portugal
>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>
>>>>
>>>> Srijan Sivakumar <ssivakum at redhat.com> escreveu no dia terça,
>>>> 18/08/2020 à(s) 21:42:
>>>>
>>>>> Hi João,
>>>>>
>>>>> There isn't a straightforward way of tracking the crawl but as gluster
>>>>> uses find and stat during crawl, one can run the following command,
>>>>> # ps aux | grep find
>>>>>
>>>>> If the output is of the form,
>>>>> "root    1513  0.0  0.1  127224  2636  ?        S    12:24    0.00
>>>>> /usr/bin/find  .  -exec  /usr/bin/stat  {}  \"
>>>>> then it means that the crawl is still going on.
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> SRIJAN SIVAKUMAR
>>>>>
>>>>> Associate Software Engineer
>>>>>
>>>>> Red Hat
>>>>> <https://www.redhat.com>
>>>>>
>>>>>
>>>>> <https://www.redhat.com>
>>>>>
>>>>> T: +91-9727532362
>>>>> <http://redhatemailsignature-marketing.itos.redhat.com/>
>>>>> <https://red.ht/sig>
>>>>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>>>>
>>>>>
>>>>> On Wed, Aug 19, 2020 at 1:46 AM João Baúto <
>>>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>>>
>>>>>> Hi Srijan,
>>>>>>
>>>>>> Is there a way of getting the status of the crawl process?
>>>>>> We are going to expand this cluster, adding 12 new bricks (around
>>>>>> 500TB) and we rely heavily on the quota feature to control the space usage
>>>>>> for each project. It's been running since Saturday (nothing changed)
>>>>>> and unsure if it's going to finish tomorrow or in weeks.
>>>>>>
>>>>>> Thank you!
>>>>>> *João Baúto*
>>>>>> ---------------
>>>>>>
>>>>>> *Scientific Computing and Software Platform*
>>>>>> Champalimaud Research
>>>>>> Champalimaud Center for the Unknown
>>>>>> Av. Brasília, Doca de Pedrouços
>>>>>> 1400-038 Lisbon, Portugal
>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>>
>>>>>>
>>>>>> Srijan Sivakumar <ssivakum at redhat.com> escreveu no dia domingo,
>>>>>> 16/08/2020 à(s) 06:11:
>>>>>>
>>>>>>> Hi João,
>>>>>>>
>>>>>>> Yes it'll take some time given the file system size as it has to
>>>>>>> change the xattrs in each level and then crawl upwards.
>>>>>>>
>>>>>>> stat is done by the script itself so the crawl is initiated.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Srijan Sivakumar
>>>>>>>
>>>>>>> On Sun 16 Aug, 2020, 04:58 João Baúto, <
>>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>>>>>
>>>>>>>> Hi Srijan & Strahil,
>>>>>>>>
>>>>>>>> I ran the quota_fsck script mentioned in Hari's blog post in all
>>>>>>>> bricks and it detected a lot of size mismatch.
>>>>>>>>
>>>>>>>> The script was executed as,
>>>>>>>>
>>>>>>>>    - python quota_fsck.py --sub-dir projectB --fix-issues
>>>>>>>>    /mnt/tank /tank/volume2/brick (in all nodes and bricks)
>>>>>>>>
>>>>>>>> Here is a snippet from the script,
>>>>>>>>
>>>>>>>> Size Mismatch    /tank/volume2/brick/projectB {'parents':
>>>>>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count':
>>>>>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L,
>>>>>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count':
>>>>>>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L,
>>>>>>>> 'size': 18446645297413872640L} 15204281691754
>>>>>>>> MARKING DIRTY: /tank/volume2/brick/projectB
>>>>>>>> stat on /mnt/tank/projectB
>>>>>>>> Files verified : 683223
>>>>>>>> Directories verified : 46823
>>>>>>>> Objects Fixed : 705230
>>>>>>>>
>>>>>>>> Checking the xattr in the bricks I can see the directory in
>>>>>>>> question marked as dirty,
>>>>>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB
>>>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>>>> # file: tank/volume2/brick/projectB
>>>>>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c
>>>>>>>>
>>>>>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705
>>>>>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc
>>>>>>>>
>>>>>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0
>>>>>>>>
>>>>>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea
>>>>>>>> trusted.glusterfs.quota.dirty=0x3100
>>>>>>>>
>>>>>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff
>>>>>>>>
>>>>>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea
>>>>>>>>
>>>>>>>> Now, my question is how do I trigger Gluster to recalculate the
>>>>>>>> quota for this directory? Is it automatic but it takes a while? Because the
>>>>>>>> quota list did change but not to a good "result".
>>>>>>>>
>>>>>>>> Path                   Hard-limit  Soft-limit           Used
>>>>>>>>     Available         Soft-limit exceeded?   Hard-limit exceeded?
>>>>>>>> /projectB            100.0TB    80%(80.0TB)   16383.9PB   190.1TB
>>>>>>>>          No                               No
>>>>>>>>
>>>>>>>> I would like to avoid a disable/enable quota in the volume as it
>>>>>>>> removes the configs.
>>>>>>>>
>>>>>>>> Thank you for all the help!
>>>>>>>> *João Baúto*
>>>>>>>> ---------------
>>>>>>>>
>>>>>>>> *Scientific Computing and Software Platform*
>>>>>>>> Champalimaud Research
>>>>>>>> Champalimaud Center for the Unknown
>>>>>>>> Av. Brasília, Doca de Pedrouços
>>>>>>>> 1400-038 Lisbon, Portugal
>>>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>>>>
>>>>>>>>
>>>>>>>> Srijan Sivakumar <ssivakum at redhat.com> escreveu no dia sábado,
>>>>>>>> 15/08/2020 à(s) 11:57:
>>>>>>>>
>>>>>>>>> Hi João,
>>>>>>>>>
>>>>>>>>> The quota accounting error is what we're looking at here. I think
>>>>>>>>> you've already looked into the blog post by Hari and are using the script
>>>>>>>>> to fix the accounting issue.
>>>>>>>>> That should help you out in fixing this issue.
>>>>>>>>>
>>>>>>>>> Let me know if you face any issues while using it.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Srijan Sivakumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri 14 Aug, 2020, 17:10 João Baúto, <
>>>>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Strahil,
>>>>>>>>>>
>>>>>>>>>> I have tried removing the quota for that specific directory and
>>>>>>>>>> setting it again but it didn't work (maybe it has to be a quota disable and
>>>>>>>>>> enable in the volume options). Currently testing a solution
>>>>>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@
>>>>>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its
>>>>>>>>>> detecting a lot of size mismatch in files.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> *João Baúto*
>>>>>>>>>> ---------------
>>>>>>>>>>
>>>>>>>>>> *Scientific Computing and Software Platform*
>>>>>>>>>> Champalimaud Research
>>>>>>>>>> Champalimaud Center for the Unknown
>>>>>>>>>> Av. Brasília, Doca de Pedrouços
>>>>>>>>>> 1400-038 Lisbon, Portugal
>>>>>>>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Strahil Nikolov <hunter86_bg at yahoo.com> escreveu no dia sexta,
>>>>>>>>>> 14/08/2020 à(s) 10:16:
>>>>>>>>>>
>>>>>>>>>>> Hi João,
>>>>>>>>>>>
>>>>>>>>>>> Based on your output it seems that the quota size is different
>>>>>>>>>>> on the 2 bricks.
>>>>>>>>>>>
>>>>>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe
>>>>>>>>>>> it will be the easiest way  to fix it.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Strahil Nikolov
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> На 14 август 2020 г. 4:35:14 GMT+03:00, "João Baúto" <
>>>>>>>>>>> joao.bauto at neuro.fchampalimaud.org> написа:
>>>>>>>>>>> >Hi all,
>>>>>>>>>>> >
>>>>>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node
>>>>>>>>>>> running
>>>>>>>>>>> >Gluster
>>>>>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our
>>>>>>>>>>> >members on
>>>>>>>>>>> >each project. Two days ago we noticed inconsistent space used
>>>>>>>>>>> reported
>>>>>>>>>>> >by
>>>>>>>>>>> >Gluster in the quota list.
>>>>>>>>>>> >
>>>>>>>>>>> >A small snippet of gluster volume quota vol list,
>>>>>>>>>>> >
>>>>>>>>>>> > Path                   Hard-limit  Soft-limit          Used
>>>>>>>>>>> >Available         Soft-limit exceeded?   Hard-limit exceeded?
>>>>>>>>>>> >/projectA              5.0TB        80%(4.0TB)    3.1TB
>>>>>>>>>>>    1.9TB
>>>>>>>>>>> >         No                               No
>>>>>>>>>>> >*/projectB            100.0TB    80%(80.0TB)  16383.4PB
>>>>>>>>>>>  740.9TB
>>>>>>>>>>> > No                               No*
>>>>>>>>>>> >/projectC              70.0TB     80%(56.0TB)   50.0TB
>>>>>>>>>>>  20.0TB
>>>>>>>>>>> >     No                              No
>>>>>>>>>>> >
>>>>>>>>>>> >The total space available in the cluster is 360TB, the quota for
>>>>>>>>>>> >projectB
>>>>>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and
>>>>>>>>>>> 740TB
>>>>>>>>>>> >available (already decreased from 750TB).
>>>>>>>>>>> >
>>>>>>>>>>> >There was an issue in Gluster 3.x related to the wrong
>>>>>>>>>>> directory quota
>>>>>>>>>>> >(
>>>>>>>>>>> >
>>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html
>>>>>>>>>>> > and
>>>>>>>>>>> >
>>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html
>>>>>>>>>>> )
>>>>>>>>>>> >but it's marked as solved (not sure if the solution still
>>>>>>>>>>> applies).
>>>>>>>>>>> >
>>>>>>>>>>> >*On projectB*
>>>>>>>>>>> ># getfattr -d -m . -e hex projectB
>>>>>>>>>>> ># file: projectB
>>>>>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9
>>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112
>>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112
>>>>>>>>>>> >
>>>>>>>>>>> >*On projectA*
>>>>>>>>>>> ># getfattr -d -m . -e hex projectA
>>>>>>>>>>> ># file: projectA
>>>>>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64
>>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498
>>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff
>>>>>>>>>>>
>>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498
>>>>>>>>>>> >
>>>>>>>>>>> >Any idea on what's happening and how to fix it?
>>>>>>>>>>> >
>>>>>>>>>>> >Thanks!
>>>>>>>>>>> >*João Baúto*
>>>>>>>>>>> >---------------
>>>>>>>>>>> >
>>>>>>>>>>> >*Scientific Computing and Software Platform*
>>>>>>>>>>> >Champalimaud Research
>>>>>>>>>>> >Champalimaud Center for the Unknown
>>>>>>>>>>> >Av. Brasília, Doca de Pedrouços
>>>>>>>>>>> >1400-038 Lisbon, Portugal
>>>>>>>>>>> >fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>>>>>>>
>>>>>>>>>> ________
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Community Meeting Calendar:
>>>>>>>>>>
>>>>>>>>>> Schedule -
>>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>>>>>> Bridge: https://bluejeans.com/441850968
>>>>>>>>>>
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>
>>>
>>>
>
> --
> Thanks and Regards,
>
> SRIJAN SIVAKUMAR
>
> Associate Software Engineer
>
> Red Hat
> <https://www.redhat.com>
>
>
> <https://www.redhat.com>
>
> T: +91-9727532362 <http://redhatemailsignature-marketing.itos.redhat.com/>
>
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200820/5dca204c/attachment.html>


More information about the Gluster-users mailing list