[Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch
Mauro Tridici
mauro.tridici at cmcc.it
Sat Sep 8 18:46:21 UTC 2018
Hi Hari, Hi Sanoj,
sorry if I disturb you again, but I have a problem similar to the one described (and solved) below.
As you can see from the following output there is a mismatch between the used disk size reported by quota and the value returned by du command.
[root at s01 qc]# df -hT /tier2/ASC
File system Tipo Dim. Usati Dispon. Uso% Montato su
s01-stg:tier2 fuse.glusterfs 10T 4,5T 5,6T 45% /tier2
[root at s01 qc]# gluster volume quota tier2 list /ASC
Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/ASC 10.0TB 99%(9.9TB) 4.4TB 5.6TB No No
[root at s01 qc]# du -hs /tier2/ASC
1,9T /tier2/ASC
I already executed the "quota_fsck_new-6.py” script to identify the folder to be fixed.
In attachment you can find the output produced executing a customized “check” script on each gluster server (s01, s02, s03 are the names of the servers).
check script:
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
./quota_fsck_new-6.py --full-logs --sub-dir ASC /gluster/mnt$i/brick >> $host.log
done
I don’t know how to read the log files in order to detect the critical folder.
Anyway, since the mismatch problem has been detected after some files have been deleted from /tier2/ASC/primavera/cam directory, I thought that the problem should be there.
So, I tried to execute the following customized fix script:
fix script:
#!/bin/bash
#set -xv
host=$(hostname)
for i in {1..12}
do
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera/cam
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC/primavera
setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 /gluster/mnt$i/brick/ASC
#after setfattr procedure, please comment setaffr lines and uncomment du lines and execute the script again
#du /gluster/mnt$i/brick/ASC/primavera/cam
#du /gluster/mnt$i/brick/ASC/primavera
#du /gluster/mnt$i/brick/ASC
done
~
Unfortunately, the problem seems to be still here.
I also tried to use the quota-fsck script using —fix-issues option, but nothing changed.
Could you please help me to try to solve this issue?
Thank you very much in advance,
Mauro
> Il giorno 11 lug 2018, alle ore 10:23, Mauro Tridici <mauro.tridici at cmcc.it> ha scritto:
>
> Hi Hari, Hi Sanoj,
>
> thank you very much for your patience and your support!
> The problem has been solved following your instructions :-)
>
> N.B.: in order to reduce the running time, I executed the “du” command as follows:
>
> for i in {1..12}
> do
> du /gluster/mnt$i/brick/CSP/ans004/ftp
> done
>
> and not on each brick at "/gluster/mnt$i/brick" tree level.
>
> I hope it was a correct idea :-)
>
> Thank you again for helping me to solve this issue.
> Have a good day.
> Mauro
>
>
>> Il giorno 11 lug 2018, alle ore 09:16, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>
>> Hi,
>>
>> There was a accounting issue in your setup.
>> The directory ans004/ftp/CMCC-CM2-VHR4-CTR/atm/hist and ans004/ftp/CMCC-CM2-VHR4
>> had wrong size value on them.
>>
>> To fix it, you will have to set dirty xattr (an internal gluster
>> xattr) on these directories
>> which will mark it for calculating the values again for the directory.
>> And then do a du on the mount after setting the xattrs. This will do a
>> stat that will
>> calculate and update the right values.
>>
>> To set dirty xattr:
>> setfattr -n trusted.glusterfs.quota.dirty -v 0x3100 <path to the directory>
>> This has to be done for both the directories one after the other on each brick.
>> Once done for all the bricks issue the du command.
>>
>> Thanks to Sanoj for the guidance
>> On Tue, Jul 10, 2018 at 6:37 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>> Hi Hari,
>>>
>>> sorry for the late.
>>> Yes, the gluster volume is a single volume that is spread between all the 3 node and has 36 bricks
>>>
>>> In attachment you can find a tar.gz file containing:
>>>
>>> - gluster volume status command output;
>>> - gluster volume info command output;
>>> - the output of the following script execution (it generated 3 files per server: s01.log, s02.log, s03.log).
>>>
>>> This is the “check.sh” script that has been executed on each server (servers are s01, s02, s03).
>>>
>>> #!/bin/bash
>>>
>>> #set -xv
>>>
>>> host=$(hostname)
>>>
>>> for i in {1..12}
>>> do
>>> ./quota_fsck_new-6.py --full-logs --sub-dir CSP/ans004 /gluster/mnt$i/brick >> $host.log
>>> done
>>>
>>> Many thanks,
>>> Mauro
>>>
>>>
>>> Il giorno 10 lug 2018, alle ore 12:12, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>
>>> Hi Mauro,
>>>
>>> Can you send the gluster v status command output?
>>>
>>> Is it a single volume that is spread between all the 3 node and has 36 bricks?
>>> If yes, you will have to run on all the bricks.
>>>
>>> In the command use sub-dir option if you are running only for the
>>> directory where limit is set. else if you are
>>> running on the brick mount path you can remove it.
>>>
>>> The full-log will consume a lot of space as its going to record the
>>> xattrs for each entry inside the path we are
>>> running it. This data is needed to cross check and verify quota's
>>> marker functionality.
>>>
>>> To reduce resource consumption you can run it on one replica set alone
>>> (if its replicate volume)
>>> But its better if you can run it on all the brick if possible and if
>>> the size consumed is fine with you.
>>>
>>> Make sure you run it with the script link provided above by Sanoj. (patch set 6)
>>> On Tue, Jul 10, 2018 at 2:54 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>>
>>> Hi Hari,
>>>
>>> thank you very much for your answer.
>>> I will try to use the script mentioned above pointing to each backend bricks.
>>>
>>> So, if I understand, since I have a gluster cluster composed by 3 nodes (with 12 bricks on each node), I have to execute the script 36 times. Right?
>>>
>>> You can find below the “df” command output executed on a cluster node:
>>>
>>> /dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster
>>> /dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2
>>> /dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3
>>> /dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,7T 3,4T 63% /gluster/mnt8
>>> /dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1
>>> /dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 65% /gluster/mnt10
>>> /dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,7T 3,4T 64% /gluster/mnt6
>>> /dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4
>>> /dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,7T 60% /gluster/mnt11
>>> /dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,7T 60% /gluster/mnt12
>>> /dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,7T 3,4T 64% /gluster/mnt5
>>> /dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,7T 3,4T 63% /gluster/mnt7
>>> /dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 65% /gluster/mnt9
>>>
>>> I will execute the following command and I will put here the output.
>>>
>>> ./quota_fsck_new.py --full-logs --sub-dir /gluster/mnt{1..12}
>>>
>>> Thank you again for your support.
>>> Regards,
>>> Mauro
>>>
>>> Il giorno 10 lug 2018, alle ore 11:02, Hari Gowtham <hgowtham at redhat.com <mailto:hgowtham at redhat.com>> ha scritto:
>>>
>>> Hi,
>>>
>>> There is no explicit command to backup all the quota limits as per my
>>> understanding. need to look further about this.
>>> But you can do the following to backup and set it.
>>> Gluster volume quota volname list which will print all the quota
>>> limits on that particular volume.
>>> You will have to make a note of the directories with their respective limit set.
>>> Once noted down, you can disable quota on the volume and then enable it.
>>> Once enabled, you will have to set each limit explicitly on the volume.
>>>
>>> Before doing this we suggest you can to try running the script
>>> mentioned above with the backend brick path instead of the mount path.
>>> you need to run this on the machines where the backend bricks are
>>> located and not on the mount.
>>> On Mon, Jul 9, 2018 at 9:01 PM Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>> Hi Sanoj,
>>>
>>> could you provide me the command that I need in order to backup all quota limits?
>>> If there is no solution for this kind of problem, I would like to try to follow your “backup” suggestion.
>>>
>>> Do you think that I should contact gluster developers too?
>>>
>>> Thank you very much.
>>> Regards,
>>> Mauro
>>>
>>>
>>> Il giorno 05 lug 2018, alle ore 09:56, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> ha scritto:
>>>
>>> Hi Sanoj,
>>>
>>> unfortunately the output of the command execution was not helpful.
>>>
>>> [root at s01 ~]# find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
>>> [root at s01 ~]#
>>>
>>> Do you have some other idea in order to detect the cause of the issue?
>>>
>>> Thank you again,
>>> Mauro
>>>
>>>
>>> Il giorno 05 lug 2018, alle ore 09:08, Sanoj Unnikrishnan <sunnikri at redhat.com <mailto:sunnikri at redhat.com>> ha scritto:
>>>
>>> Hi Mauro,
>>>
>>> A script issue did not capture all necessary xattr.
>>> Could you provide the xattrs with..
>>> find /tier2/CSP/ans004 | xargs getfattr -d -m. -e hex
>>>
>>> Meanwhile, If you are being impacted, you could do the following
>>> back up quota limits
>>> disable quota
>>> enable quota
>>> freshly set the limits.
>>>
>>> Please capture the xattr values first, so that we can get to know what went wrong.
>>> Regards,
>>> Sanoj
>>>
>>>
>>> On Tue, Jul 3, 2018 at 4:09 PM, Mauro Tridici <mauro.tridici at cmcc.it <mailto:mauro.tridici at cmcc.it>> wrote:
>>>
>>>
>>> Dear Sanoj,
>>>
>>> thank you very much for your support.
>>> I just downloaded and executed the script you suggested.
>>>
>>> This is the full command I executed:
>>>
>>> ./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
>>>
>>> In attachment, you can find the logs generated by the script.
>>> What can I do now?
>>>
>>> Thank you very much for your patience.
>>> Mauro
>>>
>>>
>>>
>>>
>>> Il giorno 03 lug 2018, alle ore 11:34, Sanoj Unnikrishnan <sunnikri at redhat.com <mailto:sunnikri at redhat.com>> ha scritto:
>>>
>>> Hi Mauro,
>>>
>>> This may be an issue with update of backend xattrs.
>>> To RCA further and provide resolution could you provide me with the logs by running the following fsck script.
>>> https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py <https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py>
>>>
>>> Try running the script and revert with the logs generated.
>>>
>>> Thanks,
>>> Sanoj
>>>
>>>
>>> On Mon, Jul 2, 2018 at 2:21 PM, Mauro Tridici <mauro.tridici at cmcc.it> wrote:
>>>
>>>
>>> Dear Users,
>>>
>>> I just noticed that, after some data deletions executed inside "/tier2/CSP/ans004” folder, the amount of used disk reported by quota command doesn’t reflect the value indicated by du command.
>>> Surfing on the web, it seems that it is a bug of previous versions of Gluster FS and it was already fixed.
>>> In my case, the problem seems unfortunately still here.
>>>
>>> How can I solve this issue? Is it possible to do it without starting a downtime period?
>>>
>>> Thank you very much in advance,
>>> Mauro
>>>
>>> [root at s01 ~]# glusterfs -V
>>> glusterfs 3.10.5
>>> Repository revision: git://git.gluster.org/glusterfs.git
>>> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
>>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>>> It is licensed to you under your choice of the GNU Lesser
>>> General Public License, version 3 or any later version (LGPLv3
>>> or later), or the GNU General Public License, version 2 (GPLv2),
>>> in all cases as published by the Free Software Foundation.
>>>
>>> [root at s01 ~]# gluster volume quota tier2 list /CSP/ans004
>>> Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
>>> -------------------------------------------------------------------------------------------------------------------------------
>>> /CSP/ans004 1.0TB 99%(1013.8GB) 3.9TB 0Bytes Yes Yes
>>>
>>> [root at s01 ~]# du -hs /tier2/CSP/ans004/
>>> 295G /tier2/CSP/ans004/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hari Gowtham.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Hari Gowtham.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/b6566b7a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.zip
Type: application/zip
Size: 1242472 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/b6566b7a/attachment-0001.zip>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180908/b6566b7a/attachment-0003.html>
More information about the Gluster-users
mailing list