[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch

Mauro Tridici mauro.tridici at cmcc.it
Sat Sep 8 18:55:11 UTC 2018


Ops, sorry.
Attached ZIP file was quarantined by mail security system policies (it seems that .zip and .docm files are not good).

So, I try to send you the log files in the tar.gz format

Thank you,
Mauro


> Il giorno 08 set 2018, alle ore 20:46, Mauro Tridici <mauro.tridici at cmcc.it> ha scritto:
> 
> 
> Hi Hari, Hi Sanoj,
> 
> sorry if I disturb you again, but I have a problem similar to the one described (and solved) below.
> As you can see from the following output there is a mismatch between the used disk size reported by quota and the value returned by du command.
> 
> [root at s01 qc]# df -hT /tier2/ASC
> File system    Tipo            Dim. Usati Dispon. Uso% Montato su
> s01-stg:tier2  fuse.glusterfs   10T  4,5T    5,6T  45% /tier2
> 
> [root at s01 qc]# gluster volume quota tier2 list /ASC
>                   Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
> -------------------------------------------------------------------------------------------------------------------------------
> /ASC                                      10.0TB     99%(9.9TB)    4.4TB   5.6TB              No                   No
> 
> [root at s01 qc]# du -hs /tier2/ASC
> 1,9T	/tier2/ASC
> 
> I already executed the "quota_fsck_new-6.py” script to identify the folder to be fixed.
> In attachment you can find the output produced executing a customized “check” script on each gluster server (s01, s02, s03 are the names of the servers).
> 
> check script:
> 
> #!/bin/bash
> 
> #set -xv
> 
> host=$(hostname)
> 
> for i in {1..12}
> do
>  ./quota_fsck_new-6.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
> done
> 
> I don’t know how to read the log files in order to detect the critical folder.
> Anyway, since the mismatch problem has been detected after some files have been deleted from /tier2/ASC/primavera/cam directory, I thought that the problem should be there.
> So, I tried to execute the following customized fix script:
> 
> fix script:
> 
> #!/bin/bash
> 
> #set -xv
> 
> host=$(hostname)
> 
> for i in {1..12}
> do
>  setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera/cam
>  setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera
>  setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC
> 
>  #after setfattr procedure, please comment setaffr lines and uncomment du lines and execute the script again
> 
>  #du /gluster/mnt$i/brick/ASC/primavera/cam
>  #du /gluster/mnt$i/brick/ASC/primavera
>  #du /gluster/mnt$i/brick/ASC
> 
> done
> ~
> 
> Unfortunately, the problem seems to be still here.
> I also tried to use the quota-fsck script using —fix-issues option, but nothing changed.
> Could you please help me to try to solve this issue?
> 
> Thank you very much in advance,
> Mauro
> 
> <logs.zip>
> 
>> Il giorno 11 lug 2018, alle ore 10:23, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> ha scritto:
>> 
>> Hi Hari, Hi Sanoj,
>> 
>> thank you very much for your patience and your support! 
>> The problem has been solved following your instructions :-)
>> 
>> N.B.: in order to reduce the running time, I executed the “du” command as follows:
>> 
>> for i in {1..12}
>> do
>>  du /gluster/mnt$i/brick/CSP/ans004/ftp
>> done
>> 
>> and not on each brick at "/gluster/mnt$i/brick" tree level.
>> 
>> I hope it was a correct idea :-)
>> 
>> Thank you again for helping me to solve this issue.
>> Have a good day.
>> Mauro
>> 
>> 
>>> Il giorno 11 lug 2018, alle ore 09:16, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>> 
>>> Hi,
>>> 
>>> There was a accounting issue in your setup.
>>> The directory ans004/ftp/CMCC-CM2-VHR4-CTR/atm/hist and ans004/ftp/CMCC-CM2-VHR4
>>> had wrong size value on them.
>>> 
>>> To fix it, you will have to set dirty xattr (an internal gluster
>>> xattr) on these directories
>>> which will mark it for calculating the values again for the directory.
>>> And then do a du on the mount after setting the xattrs. This will do a
>>> stat that will
>>> calculate and update the right values.
>>> 
>>> To set dirty xattr:
>>> setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 <path to the directory>
>>> This has to be done for both the directories one after the other on each brick.
>>> Once done for all the bricks issue the du command.
>>> 
>>> Thanks to Sanoj for the guidance
>>> On Tue, Jul 10, 2018 at 6:37 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>> 
>>>> 
>>>> Hi Hari,
>>>> 
>>>> sorry for the late.
>>>> Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
>>>> 
>>>> In attachment you can find a tar.gz file containing:
>>>> 
>>>> - gluster volume status command output;
>>>> - gluster volume info command output;
>>>> - the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
>>>> 
>>>> This is the “check.sh” script that has been executed on each server (servers are s01, s02, s03).
>>>> 
>>>> #!/bin/bash
>>>> 
>>>> #set -xv
>>>> 
>>>> host=$(hostname)
>>>> 
>>>> for i in {1..12}
>>>> do
>>>> ./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
>>>> done
>>>> 
>>>> Many thanks,
>>>> Mauro
>>>> 
>>>> 
>>>> Il giorno 10 lug 2018, alle ore 12:12, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>> 
>>>> Hi Mauro,
>>>> 
>>>> Can you send the gluster v status command output?
>>>> 
>>>> Is it a single volume that is spread between all the 3 node and has 36 bricks?
>>>> If yes, you will have to run on all the bricks.
>>>> 
>>>> In the command use sub-dir option if you are running only for the
>>>> directory where limit is set. else if you are
>>>> running on the brick mount path you can remove it.
>>>> 
>>>> The full-log will consume a lot of space as its going to record the
>>>> xattrs for each entry inside the path we are
>>>> running it. This data is needed to cross check and verify quota's
>>>> marker functionality.
>>>> 
>>>> To reduce resource consumption you can run it on one replica set alone
>>>> (if its replicate volume)
>>>> But its better if you can run it on all the brick if possible and if
>>>> the size consumed is fine with you.
>>>> 
>>>> Make sure you run it with the script link provided above by Sanoj. (patch set 6)
>>>> On Tue, Jul 10, 2018 at 2:54 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>> 
>>>> 
>>>> 
>>>> Hi Hari,
>>>> 
>>>> thank you very much for your answer.
>>>> I will try to use the script mentioned above pointing to each backend bricks.
>>>> 
>>>> So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
>>>> 
>>>> You can find below the “df” command output executed on a cluster node:
>>>> 
>>>> /dev/mapper/cl_s01-gluster           100G   33M    100G   1% /gluster
>>>> /dev/mapper/gluster_vgd-gluster_lvd  9,0T  5,6T    3,5T  62% /gluster/mnt2
>>>> /dev/mapper/gluster_vge-gluster_lve  9,0T  5,7T    3,4T  63% /gluster/mnt3
>>>> /dev/mapper/gluster_vgj-gluster_lvj  9,0T  5,7T    3,4T  63% /gluster/mnt8
>>>> /dev/mapper/gluster_vgc-gluster_lvc  9,0T  5,6T    3,5T  62% /gluster/mnt1
>>>> /dev/mapper/gluster_vgl-gluster_lvl  9,0T  5,8T    3,3T  65% /gluster/mnt10
>>>> /dev/mapper/gluster_vgh-gluster_lvh  9,0T  5,7T    3,4T  64% /gluster/mnt6
>>>> /dev/mapper/gluster_vgf-gluster_lvf  9,0T  5,7T    3,4T  63% /gluster/mnt4
>>>> /dev/mapper/gluster_vgm-gluster_lvm  9,0T  5,4T    3,7T  60% /gluster/mnt11
>>>> /dev/mapper/gluster_vgn-gluster_lvn  9,0T  5,4T    3,7T  60% /gluster/mnt12
>>>> /dev/mapper/gluster_vgg-gluster_lvg  9,0T  5,7T    3,4T  64% /gluster/mnt5
>>>> /dev/mapper/gluster_vgi-gluster_lvi  9,0T  5,7T    3,4T  63% /gluster/mnt7
>>>> /dev/mapper/gluster_vgk-gluster_lvk  9,0T  5,8T    3,3T  65% /gluster/mnt9
>>>> 
>>>> I will execute the following command and I will put here the output.
>>>> 
>>>> ./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
>>>> 
>>>> Thank you again for your support.
>>>> Regards,
>>>> Mauro
>>>> 
>>>> Il giorno 10 lug 2018, alle ore 11:02, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>> 
>>>> Hi,
>>>> 
>>>> There is no explicit command to backup all the quota limits as per my
>>>> understanding. need to look further about this.
>>>> But you can do the following to backup and set it.
>>>> Gluster volume quota volname list which will print all the quota
>>>> limits on that particular volume.
>>>> You will have to make a note of the directories with their respective limit set.
>>>> Once noted down, you can disable quota on the volume and then enable it.
>>>> Once enabled, you will have to set each limit explicitly on the volume.
>>>> 
>>>> Before doing this we suggest you can to try running the script
>>>> mentioned above with the backend brick path instead of the mount path.
>>>> you need to run this on the machines where the backend bricks are
>>>> located and not on the mount.
>>>> On Mon, Jul 9, 2018 at 9:01 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>> 
>>>> 
>>>> Hi Sanoj,
>>>> 
>>>> could you provide me the command that I need in order to backup all quota limits?
>>>> If there is no solution for this kind of problem, I would like to try to follow your “backup” suggestion.
>>>> 
>>>> Do you think that I should contact gluster developers too?
>>>> 
>>>> Thank you very much.
>>>> Regards,
>>>> Mauro
>>>> 
>>>> 
>>>> Il giorno 05 lug 2018, alle ore 09:56, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> ha scritto:
>>>> 
>>>> Hi Sanoj,
>>>> 
>>>> unfortunately the output of the command execution was not helpful.
>>>> 
>>>> [root at s01 ~]# find /tier2/CSP/ans004  | xargs getfattr -d -m. -e hex
>>>> [root at s01 ~]#
>>>> 
>>>> Do you have some other idea in order to detect the cause of the issue?
>>>> 
>>>> Thank you again,
>>>> Mauro
>>>> 
>>>> 
>>>> Il giorno 05 lug 2018, alle ore 09:08, Sanoj Unnikrishnan <sunnikri at redhat.com <mailto:sunnikri at redhat.com>> ha scritto:
>>>> 
>>>> Hi Mauro,
>>>> 
>>>> A script issue did not capture all necessary xattr.
>>>> Could you provide the xattrs with..
>>>> find /tier2/CSP/ans004  | xargs getfattr -d -m. -e hex
>>>> 
>>>> Meanwhile, If you are being impacted, you could do the following
>>>> back up quota limits
>>>> disable quota
>>>> enable quota
>>>> freshly set the limits.
>>>> 
>>>> Please capture the xattr values first, so that we can get to know what went wrong.
>>>> Regards,
>>>> Sanoj
>>>> 
>>>> 
>>>> On Tue, Jul 3, 2018 at 4:09 PM, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>> 
>>>> 
>>>> Dear Sanoj,
>>>> 
>>>> thank you very much for your support.
>>>> I just downloaded and executed the script you suggested.
>>>> 
>>>> This is the full command I executed:
>>>> 
>>>> ./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
>>>> 
>>>> In attachment, you can find the logs generated by the script.
>>>> What can I do now?
>>>> 
>>>> Thank you very much for your patience.
>>>> Mauro
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Il giorno 03 lug 2018, alle ore 11:34, Sanoj Unnikrishnan <sunnikri at redhat.com <mailto:sunnikri at redhat.com>> ha scritto:
>>>> 
>>>> Hi Mauro,
>>>> 
>>>> This may be an issue with update of backend xattrs.
>>>> To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
>>>> https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py>
>>>> 
>>>> Try running the script and revert with the logs generated.
>>>> 
>>>> Thanks,
>>>> Sanoj
>>>> 
>>>> 
>>>> On Mon, Jul 2, 2018 at 2:21 PM, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>> 
>>>> 
>>>> Dear Users,
>>>> 
>>>> I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004” folder, the amount of used disk reported by quota command doesn’t reflect the value indicated by du command.
>>>> Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
>>>> In my case, the problem seems unfortunately still here.
>>>> 
>>>> How can I solve this issue? Is it possible to do it without starting a downtime period?
>>>> 
>>>> Thank you very much in advance,
>>>> Mauro
>>>> 
>>>> [root at s01 ~]# glusterfs -V
>>>> glusterfs 3.10.5
>>>> Repository revision: git://git.gluster.org/glusterfs.git <git://git.gluster.org/glusterfs.git>
>>>> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/ <https://www.gluster.org/>>
>>>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>>>> It is licensed to you under your choice of the GNU Lesser
>>>> General Public License, version 3 or any later version (LGPLv3
>>>> or later), or the GNU General Public License, version 2 (GPLv2),
>>>> in all cases as published by the Free Software Foundation.
>>>> 
>>>> [root at s01 ~]# gluster volume quota tier2 list /CSP/ans004
>>>>                Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
>>>> -------------------------------------------------------------------------------------------------------------------------------
>>>> /CSP/ans004                                1.0TB     99%(1013.8GB)    3.9TB  0Bytes             Yes                  Yes
>>>> 
>>>> [root at s01 ~]# du -hs /tier2/CSP/ans004/
>>>> 295G /tier2/CSP/ans004/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Hari Gowtham.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Hari Gowtham.
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Hari Gowtham.
>> 
>> 
>> 
> 
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/c996fbfa/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.tar.gz
Type: application/x-gzip
Size: 1240645 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/c996fbfa/attachment-0001.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/c996fbfa/attachment-0003.html>


More information about the Gluster-users mailing list