[Gluster-users] Extremely slow du

Wed Jul 12 09:09:21 UTC 2017

Hi Vijay

Thanks, It would be great if someone can go through the configuration
options. Is there any reference document where all these options are
described in detail?

I was mainly worried about very slow lookup so only did du on a certain
file which has a lot of small files (200K). The lookup time improved
dramatically. I didn't do any proper benchmarking.

Gluster 3.8 without any optimization

time du -ksh binno/

3.7G    binno/

real    117m45.733s

user    0m1.635s
sys     0m6.430s

Gluster 3.11 with optimization

time du -ksh binno/
3.7G    binno/

real    2m5.595s
user    0m0.767s
sys     0m4.437s

I have also enabled profile

Before update

Fop           Call Count    Avg-Latency    Min-Latency    Max-Latency

---           ----------    -----------    -----------    -----------

STAT                 153       90.72 us        5.00 us      666.00 us

STATFS                 3      677.67 us      620.00 us      709.00 us

OPENDIR              149     1213.81 us      519.00 us    28777.00 us

LOOKUP               552     8493.01 us        3.00 us    79689.00 us

READDIRP            3518     5351.76 us       11.00 us   341877.00 us

FORGET          10050351           0 us           0 us           0 us

RELEASE          9062130           0 us           0 us           0 us

RELEASEDIR          5395           0 us           0 us           0 us

------ ----- ----- ----- ----- ----- ----- -----  ----- ----- ----- -----

After update

Interval 8 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
Fop
 ---------   -----------   -----------   -----------   ------------
----
      0.00       0.00 us       0.00 us       0.00 us              2
RELEASEDIR
      0.08     118.00 us     113.00 us     123.00 us              2
STATFS
      0.13     190.00 us     189.00 us     191.00 us              2
LOOKUP
      0.29     422.00 us     422.00 us     422.00 us              2
OPENDIR
     99.49   28539.60 us    1698.00 us   48655.00 us             10
READDIRP
      0.00       0.00 us       0.00 us       0.00 us           5217
UPCALL
      0.00       0.00 us       0.00 us       0.00 us           5217
CI_FORGET

    Duration: 22 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

I am not sure about profiling result as I don't understand it correctly.

Thanks

Kashif

On Tue, Jul 11, 2017 at 4:22 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> Hi Kashif,
>
> Thank you for your feedback! Do you have some data on the nature of
> performance improvement observed with 3.11 in the new setup?
>
> Adding Raghavendra and Poornima for validation of configuration and help
> with identifying why certain files disappeared from the mount point after
> enabling readdir-optimize.
>
> Regards,
> Vijay
>
>
> On 07/11/2017 11:06 AM, mohammad kashif wrote:
>
>> Hi Vijay and Experts
>>
>> I didn't want to experiment with my production setup so started  a
>> parallel system with two server and around 80TB storage.  First
>> configured with gluster 3.8 and had the same lookup performance issue.
>> Then upgraded to 3.11 as you suggested and it made huge improvement in
>> lookup time. I also did some more optimization as suggested in other
>> threads.
>> Now I am going to update my production server. I am planning to use
>> following  optimization option, it would be very useful if you can point
>> out any inconsistency or suggest some other options. My production setup
>> has 5 servers consisting of  400TB storage and around 80 million files
>> of varying lengths.
>>
>> Options Reconfigured:
>> server.event-threads: 4
>> client.event-threads: 4
>> cluster.lookup-optimize: on
>> cluster.readdir-optimize: off
>> performance.client-io-threads: on
>> performance.cache-size: 1GB
>> performance.parallel-readdir: on
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> auth.allow: 163.1.136.*
>> diagnostics.latency-measurement: on
>> diagnostics.count-fop-hits: on
>>
>> I found that setting cluster.readdir-optimize to 'on' made some files
>> disappear from client !
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
>> <mailto:vbellur at redhat.com>> wrote:
>>
>>     Hi Mohammad,
>>
>>     A lot of time is being spent in addressing metadata calls as
>>     expected. Can you consider testing out with 3.11 with md-cache [1]
>>     and readdirp [2] improvements?
>>
>>     Adding Poornima and Raghavendra who worked on these enhancements to
>>     help out further.
>>
>>     Thanks,
>>     Vijay
>>
>>     [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
>>     <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>>
>>     [2] https://github.com/gluster/glusterfs/issues/166
>>     <https://github.com/gluster/glusterfs/issues/166>
>>
>>     On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
>>     <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>>
>>         Hi Vijay
>>
>>         Did you manage to look into the gluster profile logs ?
>>
>>         Thanks
>>
>>         Kashif
>>
>>         On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
>>         <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>>
>>             Hi Vijay
>>
>>             I have enabled client profiling and used this script
>>             https://github.com/bengland2/gluster-profile-analysis/blob/m
>> aster/gvp-client.sh
>>             <https://github.com/bengland2/gluster-profile-analysis/blob/
>> master/gvp-client.sh>
>>             to extract data. I am attaching output files. I don't have
>>             any reference data to compare with my output. Hopefully you
>>             can make some sense out of it.
>>
>>             On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
>>             <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>>
>>                 Would it be possible for you to turn on client profiling
>>                 and then run du? Instructions for turning on client
>>                 profiling can be found at [1]. Providing the client
>>                 profile information can help us figure out where the
>>                 latency could be stemming from.
>>
>>                 Regards,
>>                 Vijay
>>
>>                 [1] https://gluster.readthedocs.io
>> /en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling
>>                 <https://gluster.readthedocs.i
>> o/en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling>
>>
>>                 On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
>>                 <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>>
>>
>>                 wrote:
>>
>>                     Hi Vijay
>>
>>                     Thanks for your quick response. I am using gluster
>>                     3.8.11 on  Centos 7 servers
>>                     glusterfs-3.8.11-1.el7.x86_64
>>
>>                     clients are centos 6 but I tested with a centos 7
>>                     client as well and results didn't change
>>
>>                     gluster volume info Volume Name: atlasglust
>>                     Type: Distribute
>>                     Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>>                     Status: Started
>>                     Snapshot Count: 0
>>                     Number of Bricks: 5
>>                     Transport-type: tcp
>>                     Bricks:
>>                     Brick1: pplxgluster01.x.y.z:/glusterat
>> las/brick001/gv0
>>                     Brick2: pplxgluster02..x.y.z:/glustera
>> tlas/brick002/gv0
>>                     Brick3: pplxgluster03.x.y.z:/glusterat
>> las/brick003/gv0
>>                     Brick4: pplxgluster04.x.y.z:/glusterat
>> las/brick004/gv0
>>                     Brick5: pplxgluster05.x.y.z:/glusterat
>> las/brick005/gv0
>>                     Options Reconfigured:
>>                     nfs.disable: on
>>                     performance.readdir-ahead: on
>>                     transport.address-family: inet
>>                     auth.allow: x.y.z
>>
>>                     I am not using directory quota.
>>
>>                     Please let me know if you require some more info
>>
>>                     Thanks
>>
>>                     Kashif
>>
>>
>>
>>                     On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
>>                     <vbellur at redhat.com <mailto:vbellur at redhat.com>>
>> wrote:
>>
>>                         Can you please provide more details about your
>>                         volume configuration and the version of gluster
>>                         that you are using?
>>
>>                         Regards,
>>                         Vijay
>>
>>                         On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
>>                         <kashif.alig at gmail.com
>>                         <mailto:kashif.alig at gmail.com>> wrote:
>>
>>                             Hi
>>
>>                             I have just moved our 400 TB HPC storage
>>                             from lustre to gluster. It is part of a
>>                             research institute and users have very small
>>                             files to  big files ( few KB to 20GB) . Our
>>                             setup consists of 5 servers, each with 96TB
>>                             RAID 6 disks. All servers are connected
>>                             through 10G ethernet but not all clients.
>>                             Gluster volumes are distributed without any
>>                             replication. There are approximately 80
>>                             million files in file system.
>>                             I am mounting using glusterfs on  clients.
>>
>>                             I have copied everything from lustre to
>>                             gluster but old file system exist so I can
>>                             compare.
>>
>>                             The problem, I am facing is extremely slow
>>                             du on even a small directory. Also the time
>>                             taken is substantially different each time.
>>                             I tried du from same client on  a particular
>>                             directory twice and got these results.
>>
>>                             time du -sh /data/aa/bb/cc
>>                             3.7G /data/aa/bb/cc
>>                             real 7m29.243s
>>                             user 0m1.448s
>>                             sys 0m7.067s
>>
>>                             time du -sh /data/aa/bb/cc
>>                             3.7G      /data/aa/bb/cc
>>                             real 16m43.735s
>>                             user 0m1.097s
>>                             sys 0m5.802s
>>
>>                             16m and 7m is too long for a 3.7 G
>>                             directory. I must mention that the directory
>>                             contains huge number of files (208736)
>>
>>                             but running du on same directory on old data
>>                             gives this result
>>
>>                             time du -sh /olddata/aa/bb/cc
>>                             4.0G /olddata/aa/bb/cc
>>                             real 3m1.255s
>>                             user 0m0.755s
>>                             sys 0m38.099s
>>
>>                             much better if I run same command again
>>
>>                             time du -sh /olddata/aa/bb/cc
>>                             4.0G /olddata/aa/bb/cc
>>                             real 0m8.309s
>>                             user 0m0.313s
>>                             sys 0m7.755s
>>
>>                             Is there anything I can do to improve this
>>                             performance? I would also like hear from
>>                             some one who is running same kind of setup.
>>
>>                             Thanks
>>
>>                             Kashif
>>
>>
>>
>>                             ______________________________
>> _________________
>>                             Gluster-users mailing list
>>                             Gluster-users at gluster.org
>>                             <mailto:Gluster-users at gluster.org>
>>                             http://lists.gluster.org/mailm
>> an/listinfo/gluster-users
>>                             <http://lists.gluster.org/mail
>> man/listinfo/gluster-users>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170712/0c654b09/attachment.html>