[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Geoffrey Letessier geoffrey.letessier at cnrs.fr
Mon Jun 29 19:40:52 UTC 2015


Hello Vijay,

I’m really sorry to bother you but the situation is really critical for our research jobs. Indeed, since this morning, due to previously described situation, we’ve decided to stop all the production and data access until your script fix the problem.

After having reboot our storage cluster this morning (french time), no more crazy processes, CPU usage is back to the normal and quotas dont seem to grow up (but still contain big errors: > 1TB nay much more); but several quotas are no longer computed (since a couple of hours) as you can read below:
[root at lucifer ~]# gluster volume quota vol_home list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/derreumaux_team                          11.0TB       80%      0Bytes  11.0TB
/baaden_team                              20.0TB       80%      15.1TB   4.9TB
/sterpone_team                            14.0TB       80%      0Bytes  14.0TB
/amyloid_team                              7.0TB       80%       6.4TB 577.5GB
/amyloid_team/nguyen                       4.0TB       80%       3.7TB 312.7GB
/sacquin_team                             10.0TB       80%      0Bytes  10.0TB
/simlab_team                               5.0TB       80%       1.3TB   3.7TB


I dont know your operational hours in India but i think the end-of-day is over, right?  I’m really sorry to stress you but we are currently completely under pressure because it’s not a good period to stop the scientific computation and production.

Thanks by advance for your script and for your help. Can I do something accelerate the script development process (coding it myself or something like that)?

Nice evening (or night).
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

Le 29 juin 2015 à 08:43, Vijaikumar M <vmallika at redhat.com> a écrit :

> 
> 
> On Sunday 28 June 2015 01:34 PM, Geoffrey Letessier wrote:
>> Hello,
>> 
>> @Krutika: Thanks for transferring my issue.
>> 
>> Everything is becoming completely crazy; other quotas are exploding. Indeed, after having remove my previous quota in failure, some other quotas have grown up as you can read below:
>> 
>> [root at lucifer ~]# gluster volume quota vol_home list
>>                   Path                   Hard-limit Soft-limit   Used  Available
>> --------------------------------------------------------------------------------
>> /baaden_team                              20.0TB       90%      15.1TB   4.9TB
>> /sterpone_team                            14.0TB       90%      25.5TB  0Bytes
>> /simlab_team                               5.0TB       90%       1.3TB   3.7TB
>> /sacquin_team                             10.0TB       90%       8.3TB   1.7TB
>> /admin_team                                1.0TB       90%      17.0GB 1007.0GB
>> /amyloid_team                              7.0TB       90%       6.4TB 577.5GB
>> /amyloid_team/nguyen                       4.0TB       90%       3.7TB 312.7GB
>> 
>> 
>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/sterpone_team
>> cl-storage1: 3,1T /export/brick_home/brick1/sterpone_team
>> cl-storage1: 2,3T /export/brick_home/brick2/sterpone_team
>> cl-storage3: 2,7T /export/brick_home/brick1/sterpone_team
>> cl-storage3: 2,9T /export/brick_home/brick2/sterpone_team
>> => ~11TB (not 25.5TB!!!)
>> 
>> 
>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/baaden_team
>> cl-storage1: 4,2T /export/brick_home/brick1/baaden_team
>> cl-storage3: 3,7T /export/brick_home/brick1/baaden_team
>> cl-storage1: 3,6T /export/brick_home/brick2/baaden_team
>> cl-storage3: 3,5T /export/brick_home/brick2/baaden_team
>> => ~15TB (not 14TB).
>> 
>> Etc.
>> 
>> Do you please help me to urgently solve this issue because this situation is blocking and I must stop the production until.
>> 
>> Do you think upgrading storage cluster into 3.7.1 (the latest) version of GlusterFS could fix the problem?
> 
> WE need to manually fix this issue. We need to find what are directory whose quota size is miscalculated and need to fix the meta-data in the brick. We are writing an automated script for fixing this issue and will provide the script by eod IST time
> 
> Thanks,
> Vijay
> 
> 
>> 
>> Thanks by advance,
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> 
>> Le 27 juin 2015 à 08:13, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>> 
>>> Copying Vijai and Raghavendra for help...
>>> 
>>> -Krutika
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Saturday, June 27, 2015 2:13:52 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> Hi Krutika,
>>> 
>>> Since I have re-enabled the quota feature on my volume vol_home, one defined quota is become like crazy… And it’s a very very very big problem for us.
>>> 
>>> During all the day, after having re-enabled it, i noted the used space detected growing up (without any user IO on)..
>>> 
>>> [root at lucifer ~]# gluster volume quota vol_home list|grep derreumaux_team
>>> /derreumaux_team                          14.0TB       80%      13.7TB 357.2GB
>>> [root at lucifer ~]# gluster volume quota vol_home list /derreumaux_team
>>>                   Path                   Hard-limit Soft-limit   Used  Available
>>> --------------------------------------------------------------------------------
>>> /derreumaux_team                          14.0TB       80%      13.1TB 874.1GB
>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/derreumaux_team
>>> cl-storage3: 590G /export/brick_home/brick1/derreumaux_team
>>> cl-storage3: 611G /export/brick_home/brick2/derreumaux_team
>>> cl-storage1: 567G /export/brick_home/brick1/derreumaux_team
>>> cl-storage1: 564G /export/brick_home/brick2/derreumaux_team
>>> 
>>> As you can see in these 3 command lines, i obtain 3 different results but, the worse, it’s quota system est very very far from the real disk used space (13.7TB <> 13.1TB <<>> 2.3TB).
>>> 
>>> Can you please help to fix it very quickly because all this group is completely block by exceeded quota.
>>> 
>>> Thank you so much by advance,
>>> Have a nice week-end,
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> 
>>> Le 26 juin 2015 à 10:29, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>>> 
>>> No but if you are saying it is 3.5.3 rpm version, then that bug does not exist there.
>>> And still it is weird how you are seeing such bad performance. :-/
>>> Anything suspicious in the logs?
>>> 
>>> -Krutika
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Friday, June 26, 2015 1:27:16 PM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> No , it’s the 3.5.3 RPMS version if found on your reposity (published on novembre 2014).
>>> So, you suggest me to simply upgrade all servers and clients with the new 3.5.4 version? Wouldn't it be better to upgrade all the system (servers and clients) to the 3.7.1 version?
>>> 
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> 
>>> Le 26 juin 2015 à 09:03, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>>> 
>>> Also, so are you running 3.5.3 rpms on the clients? Or is it a patched version with more fixes on top of 3.5.3?
>>> The reason I ask this is because there was one performance issue introduced after 3.5.3 and fixed by 3.5.4 in replication module. I'm wondering if that could be causing the issue you experience.
>>> 
>>> -Krutika
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Friday, June 26, 2015 10:05:26 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> Hi Krutika,
>>> 
>>> Oops, I disable quota manager without saving configuration. Could you tell me how to retrieve quota list information?
>>> 
>>> I’m gonna test the untar in the meantime.
>>> 
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> 
>>> Le 26 juin 2015 à 04:56, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>>> 
>>> Hi,
>>> 
>>> So i tried out kernel src tree untar locally on a plain replicate (1x2) volume and it took me 7m30sec on an average. This was on vms and there was no rdma and there was no quota enabled.
>>> Could you try the same thing on a volume without quota to see if it makes a difference to the perf?
>>> 
>>> -Krutika
>>> 
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Wednesday, June 24, 2015 10:21:13 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> Hi Krutika,
>>> 
>>> OK, thank you very much by advance.
>>> Concerning quota system, are you in touch with Vijaykumar? Because I’m still waiting for a answer since a couple of days, nay more.
>>> 
>>> One more time, thank you.
>>> Have a nice day (in France it’s 6:50 AM).
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ingénieur système
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>>> 
>>> Le 24 juin 2015 à 05:55, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>>> 
>>> Ok so for anything related to replication, I could help you out.
>>> But for quota, it would be better to ask Vijaikumar Mallikarjuna or Raghavendra G on the mailing list.
>>> I used to work on quota, long time back. But now I am not in touch with the component anymore and do not know of the latest changes to it.
>>> For the performance issue, I will try linux kernel src untar on my machines and let you know what I find.
>>> 
>>> -Krutika
>>> 
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Monday, June 22, 2015 9:00:52 PM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> Hi Krutika,
>>> 
>>> Sorry for the delay but i was in meeting all the day. 
>>> 
>>> Good to hear from you as well. :)
>>> ;-)
>>> So you are seeing this bad performance only in 3.5.3? Any other releases you tried this test on, where the results were much better with replication?
>>> Yes but I’m not sure my issue is only concerning this specific release. A few days ago, the untar process (with the same version of GlusterFS) took around 8 minutes, now around 32 minutes. 8 was too much but what about 32 minutes? :)
>>> 
>>> That said, my matter is only concerning small files                                                           because if i play with dd (or other) with only 1 big file all is OK (client write throughput: ~1GBs => ~500MBs in each replica)
>>> 
>>> If i run my bench on my only distributed volume i get a good performance (untar: ~1m44s, etc.)..
>>> 
>>> In addition, i dunno if it can be important, I have some troubles with GlusterFS group quota: there are a lot of conflicts between quota size and actual file size which dont match and a lot of "quota xattrs not found" messages with quota-verify glusterfs app. -you can find in attachment an extract of quota-verify outputs. 
>>> 
>>> If so, could you please let me know? Meanwhile let me try the untar myself on my vms to see what could be causing the perf issue.
>>> OK, thanks. 
>>> 
>>> See you,
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> 
>>> Le 22 juin 2015 à 11:35, Krutika Dhananjay <kdhananj at redhat.com> a                                                           écrit :
>>> 
>>> Hi Geoffrey,
>>> 
>>> Good to hear from you as well. :)
>>> Ok so you say disabling write-behind does not help. Makes me wonder what                                                           the problem could be.
>>> So you are seeing this bad performance only in 3.5.3? Any other releases you tried this test on, where the results were much better with replication?
>>> If so, could you please let me know? Meanwhile let me try the untar myself on my vms to see what could be causing the perf issue.
>>> 
>>> -Krutika
>>> 
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: "Krutika Dhananjay" <kdhananj at redhat.com>
>>> Sent: Monday, June 22, 2015 10:14:26 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>>> 
>>> Hi Krutika,
>>> 
>>> It’s good to read you again :)
>>> 
>>> Here are my answers:
>>> 1- could you remind me how to know if self-heal is currently in progress? I dont note any special neither mount-point (except /var/run/gluster/vol_home one) nor dedicated process; but maybe i look in the wrong place..
>>> 2- OK, I just disabled write-behind parameter and rerun the bench. I’ll let you know more about when I will arrive at my office (I’m still at home at this time).
>>> 
>>> See you and thanks you for helping. 
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ingénieur système
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>>> 
>>> Le 22 juin 2015 à 04:35, Krutika Dhananjay <kdhananj at redhat.com> a écrit :
>>> 
>>> Hi Geoffrey,
>>> 
>>> 1. Was self-heal also in progress while I/O was happening on the volume?
>>> 2. Also, there seem to be quite a few fsyncs which could possibly have slowed things down a bit. Could you disable write-behind                                                           and try
>>>     getting the time stats one more time to eliminate the possibility of write-behind's presence causing out-of-order writes to increase the number of fsyncs
>>>     by the replication                                                           module.
>>> 
>>> -Krutika
>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
>>> To: gluster-users at gluster.org
>>> Sent: Saturday, June 20, 2015 6:04:40 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar:                                                           very poor performance
>>> 
>>> Re,
>>> 
>>> For comparison, here is the output of the                                                           same script run on a distributed only volume (2 servers of the 4 previously described, 2                                                           bricks each):                                                           
>>> #######################################################
>>> ################ UNTAR time consumed  ################
>>> #######################################################
>>> 
>>> 
>>> real 1m44.698s
>>> user 0m8.891s
>>> sys 0m8.353s
>>> 
>>> #######################################################
>>> #################  DU time consumed  ##################
>>> #######################################################
>>> 
>>> 554M linux-4.1-rc6
>>> 
>>> real 0m21.062s
>>> user 0m0.100s
>>> sys 0m1.040s
>>> 
>>> #######################################################
>>> #################  FIND time consumed  ################
>>> #######################################################
>>> 
>>> 52663
>>> 
>>> real 0m21.325s
>>> user 0m0.104s
>>> sys 0m1.054s
>>> 
>>> #######################################################
>>> #################  GREP time consumed  ################
>>> #######################################################
>>> 
>>> 7952
>>> 
>>> real 0m43.618s
>>> user 0m0.922s
>>> sys 0m3.626s
>>> 
>>> #######################################################
>>> #################  TAR time consumed  #################
>>> #######################################################
>>> 
>>> 
>>> real 0m50.577s
>>> user 0m29.745s
>>> sys 0m4.086s
>>> 
>>> #######################################################
>>> #################  RM time consumed  ##################
>>> #######################################################
>>> 
>>> 
>>> real 0m41.133s
>>> user 0m0.171s
>>> sys 0m2.522s
>>> 
>>> The performances are amazing different!
>>> 
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ingénieur système
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>>> 
>>> Le 20 juin 2015 à 02:12, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a écrit :
>>> 
>>> Dear all,
>>> 
>>> I just noticed on my main volume of my HPC cluster my IO operations become impressively poor.. 
>>> 
>>> Doing some file operations above a linux kernel sources compressed file, the untar operation can take more than 1/2 hours for this file (roughly 80MB and 52 000 files inside) as you read below:
>>> #######################################################
>>> ################ UNTAR time consumed ################
>>> #######################################################
>>> 
>>> 
>>> real 32m42.967s
>>> user 0m11.783s
>>> sys 0m15.050s
>>> 
>>> #######################################################
>>> #################                                                            DU time consumed ##################
>>> #######################################################
>>> 
>>> 557M linux-4.1-rc6
>>> 
>>> real 0m25.060s
>>> user 0m0.068s
>>> sys 0m0.344s
>>> 
>>> #######################################################
>>> #################                                                            FIND time consumed ################
>>> #######################################################
>>> 
>>> 52663
>>> 
>>> real 0m25.687s
>>> user 0m0.084s
>>> sys 0m0.387s
>>> 
>>> #######################################################
>>> #################                                                            GREP time consumed ################
>>> #######################################################
>>> 
>>> 7952
>>> 
>>> real 2m15.890s
>>> user 0m0.887s
>>> sys 0m2.777s
>>> 
>>> #######################################################
>>> #################                                                            TAR time consumed #################
>>> #######################################################
>>> 
>>> 
>>> real 1m5.551s
>>> user 0m26.536s
>>> sys 0m2.609s
>>> 
>>> #######################################################
>>> #################                                                            RM time consumed ##################
>>> #######################################################
>>> 
>>> 
>>> real 2m51.485s
>>> user 0m0.167s
>>> sys 0m1.663s
>>> 
>>> For information, this volume is a distributed replicated one and is composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with nice native performances (around 1.2GBs).
>>> 
>>> In comparison, when I use DD to generate a 100GB file on the same volume, my write throughput is around 1GB (client side) and 500MBs (server side) because of replication:
>>> Client side:
>>> [root at node056 ~]# ifstat -i                                                           ib0
>>>                                                                  ib0       
>>>  KB/s in KB/s out
>>>  3251.45 1.09e+06
>>>  3139.80 1.05e+06
>>>  3185.29 1.06e+06
>>>  3293.84 1.09e+06
>>> ...
>>> 
>>> Server side:
>>> [root at lucifer ~]# ifstat -i                                                           ib0
>>>                                                                  ib0        
>>>  KB/s in KB/s out
>>> 561818.1 1746.42
>>> 560020.3 1737.92
>>> 526337.1 1648.20
>>> 513972.7 1613.69
>>> ...
>>> 
>>> DD command:
>>> [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
>>> 100000+0 enregistrements lus
>>> 100000+0 enregistrements écrits
>>> 104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s
>>> 
>>> So this issue doesn’t seem coming from the network (which is Infiniband technology in this case)
>>> 
>>> You can find in attachments a set of files:
>>>  - mybench.sh: the bench script
>>>  - benches.txt: output of my "bench"
>>>  - profile.txt: gluster volume profile during the "bench"
>>>  - vol_status.txt: gluster volume status
>>>  - vol_info.txt: gluster volume info
>>> 
>>> Can someone help me to fix it (it’s very critical because this volume is on a HPC cluster in production).
>>> 
>>> Thanks by advance,
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ingénieur système
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>>> <benches.txt>                                                           
>>> 
>>> <mybench.sh>                                                           
>>> 
>>> <profile.txt>                                                           
>>> 
>>> <vol_info.txt>                                                           
>>> 
>>> <vol_status.txt>                                                           
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150629/5bed0dea/attachment.html>


More information about the Gluster-users mailing list