[Gluster-users] Extremely slow du

Tue Jul 11 15:22:17 UTC 2017

Hi Kashif,

Thank you for your feedback! Do you have some data on the nature of 
performance improvement observed with 3.11 in the new setup?

Adding Raghavendra and Poornima for validation of configuration and help 
with identifying why certain files disappeared from the mount point 
after enabling readdir-optimize.

Regards,
Vijay

On 07/11/2017 11:06 AM, mohammad kashif wrote:
> Hi Vijay and Experts
>
> I didn't want to experiment with my production setup so started  a
> parallel system with two server and around 80TB storage.  First
> configured with gluster 3.8 and had the same lookup performance issue.
> Then upgraded to 3.11 as you suggested and it made huge improvement in
> lookup time. I also did some more optimization as suggested in other
> threads.
> Now I am going to update my production server. I am planning to use
> following  optimization option, it would be very useful if you can point
> out any inconsistency or suggest some other options. My production setup
> has 5 servers consisting of  400TB storage and around 80 million files
> of varying lengths.
>
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> cluster.readdir-optimize: off
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.parallel-readdir: on
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> auth.allow: 163.1.136.*
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
>
> I found that setting cluster.readdir-optimize to 'on' made some files
> disappear from client !
>
> Thanks
>
> Kashif
>
>
>
> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
> <mailto:vbellur at redhat.com>> wrote:
>
>     Hi Mohammad,
>
>     A lot of time is being spent in addressing metadata calls as
>     expected. Can you consider testing out with 3.11 with md-cache [1]
>     and readdirp [2] improvements?
>
>     Adding Poornima and Raghavendra who worked on these enhancements to
>     help out further.
>
>     Thanks,
>     Vijay
>
>     [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
>     <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>
>     [2] https://github.com/gluster/glusterfs/issues/166
>     <https://github.com/gluster/glusterfs/issues/166>
>
>     On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
>     <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>
>         Hi Vijay
>
>         Did you manage to look into the gluster profile logs ?
>
>         Thanks
>
>         Kashif
>
>         On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
>         <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>
>             Hi Vijay
>
>             I have enabled client profiling and used this script
>             https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh
>             <https://github.com/bengland2/gluster-profile-analysis/blob/master/gvp-client.sh>
>             to extract data. I am attaching output files. I don't have
>             any reference data to compare with my output. Hopefully you
>             can make some sense out of it.
>
>             On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
>             <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>
>                 Would it be possible for you to turn on client profiling
>                 and then run du? Instructions for turning on client
>                 profiling can be found at [1]. Providing the client
>                 profile information can help us figure out where the
>                 latency could be stemming from.
>
>                 Regards,
>                 Vijay
>
>                 [1] https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling
>                 <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling>
>
>                 On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
>                 <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>>
>                 wrote:
>
>                     Hi Vijay
>
>                     Thanks for your quick response. I am using gluster
>                     3.8.11 on  Centos 7 servers
>                     glusterfs-3.8.11-1.el7.x86_64
>
>                     clients are centos 6 but I tested with a centos 7
>                     client as well and results didn't change
>
>                     gluster volume info Volume Name: atlasglust
>                     Type: Distribute
>                     Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>                     Status: Started
>                     Snapshot Count: 0
>                     Number of Bricks: 5
>                     Transport-type: tcp
>                     Bricks:
>                     Brick1: pplxgluster01.x.y.z:/glusteratlas/brick001/gv0
>                     Brick2: pplxgluster02..x.y.z:/glusteratlas/brick002/gv0
>                     Brick3: pplxgluster03.x.y.z:/glusteratlas/brick003/gv0
>                     Brick4: pplxgluster04.x.y.z:/glusteratlas/brick004/gv0
>                     Brick5: pplxgluster05.x.y.z:/glusteratlas/brick005/gv0
>                     Options Reconfigured:
>                     nfs.disable: on
>                     performance.readdir-ahead: on
>                     transport.address-family: inet
>                     auth.allow: x.y.z
>
>                     I am not using directory quota.
>
>                     Please let me know if you require some more info
>
>                     Thanks
>
>                     Kashif
>
>
>
>                     On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
>                     <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>
>                         Can you please provide more details about your
>                         volume configuration and the version of gluster
>                         that you are using?
>
>                         Regards,
>                         Vijay
>
>                         On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
>                         <kashif.alig at gmail.com
>                         <mailto:kashif.alig at gmail.com>> wrote:
>
>                             Hi
>
>                             I have just moved our 400 TB HPC storage
>                             from lustre to gluster. It is part of a
>                             research institute and users have very small
>                             files to  big files ( few KB to 20GB) . Our
>                             setup consists of 5 servers, each with 96TB
>                             RAID 6 disks. All servers are connected
>                             through 10G ethernet but not all clients.
>                             Gluster volumes are distributed without any
>                             replication. There are approximately 80
>                             million files in file system.
>                             I am mounting using glusterfs on  clients.
>
>                             I have copied everything from lustre to
>                             gluster but old file system exist so I can
>                             compare.
>
>                             The problem, I am facing is extremely slow
>                             du on even a small directory. Also the time
>                             taken is substantially different each time.
>                             I tried du from same client on  a particular
>                             directory twice and got these results.
>
>                             time du -sh /data/aa/bb/cc
>                             3.7G /data/aa/bb/cc
>                             real 7m29.243s
>                             user 0m1.448s
>                             sys 0m7.067s
>
>                             time du -sh /data/aa/bb/cc
>                             3.7G      /data/aa/bb/cc
>                             real 16m43.735s
>                             user 0m1.097s
>                             sys 0m5.802s
>
>                             16m and 7m is too long for a 3.7 G
>                             directory. I must mention that the directory
>                             contains huge number of files (208736)
>
>                             but running du on same directory on old data
>                             gives this result
>
>                             time du -sh /olddata/aa/bb/cc
>                             4.0G /olddata/aa/bb/cc
>                             real 3m1.255s
>                             user 0m0.755s
>                             sys 0m38.099s
>
>                             much better if I run same command again
>
>                             time du -sh /olddata/aa/bb/cc
>                             4.0G /olddata/aa/bb/cc
>                             real 0m8.309s
>                             user 0m0.313s
>                             sys 0m7.755s
>
>                             Is there anything I can do to improve this
>                             performance? I would also like hear from
>                             some one who is running same kind of setup.
>
>                             Thanks
>
>                             Kashif
>
>
>
>                             _______________________________________________
>                             Gluster-users mailing list
>                             Gluster-users at gluster.org
>                             <mailto:Gluster-users at gluster.org>
>                             http://lists.gluster.org/mailman/listinfo/gluster-users
>                             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>
>
>
>