[Gluster-users] Extremely slow du
mohammad kashif
kashif.alig at gmail.com
Wed Jul 12 09:09:21 UTC 2017
Hi Vijay
Thanks, It would be great if someone can go through the configuration
options. Is there any reference document where all these options are
described in detail?
I was mainly worried about very slow lookup so only did du on a certain
file which has a lot of small files (200K). The lookup time improved
dramatically. I didn't do any proper benchmarking.
Gluster 3.8 without any optimization
time du -ksh binno/
3.7G binno/
real 117m45.733s
user 0m1.635s
sys 0m6.430s
Gluster 3.11 with optimization
time du -ksh binno/
3.7G binno/
real 2m5.595s
user 0m0.767s
sys 0m4.437s
I have also enabled profile
Before update
Fop Call Count Avg-Latency Min-Latency Max-Latency
--- ---------- ----------- ----------- -----------
STAT 153 90.72 us 5.00 us 666.00 us
STATFS 3 677.67 us 620.00 us 709.00 us
OPENDIR 149 1213.81 us 519.00 us 28777.00 us
LOOKUP 552 8493.01 us 3.00 us 79689.00 us
READDIRP 3518 5351.76 us 11.00 us 341877.00 us
FORGET 10050351 0 us 0 us 0 us
RELEASE 9062130 0 us 0 us 0 us
RELEASEDIR 5395 0 us 0 us 0 us
------ ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
After update
Interval 8 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls
Fop
--------- ----------- ----------- ----------- ------------
----
0.00 0.00 us 0.00 us 0.00 us 2
RELEASEDIR
0.08 118.00 us 113.00 us 123.00 us 2
STATFS
0.13 190.00 us 189.00 us 191.00 us 2
LOOKUP
0.29 422.00 us 422.00 us 422.00 us 2
OPENDIR
99.49 28539.60 us 1698.00 us 48655.00 us 10
READDIRP
0.00 0.00 us 0.00 us 0.00 us 5217
UPCALL
0.00 0.00 us 0.00 us 0.00 us 5217
CI_FORGET
Duration: 22 seconds
Data Read: 0 bytes
Data Written: 0 bytes
I am not sure about profiling result as I don't understand it correctly.
Thanks
Kashif
On Tue, Jul 11, 2017 at 4:22 PM, Vijay Bellur <vbellur at redhat.com> wrote:
> Hi Kashif,
>
> Thank you for your feedback! Do you have some data on the nature of
> performance improvement observed with 3.11 in the new setup?
>
> Adding Raghavendra and Poornima for validation of configuration and help
> with identifying why certain files disappeared from the mount point after
> enabling readdir-optimize.
>
> Regards,
> Vijay
>
>
> On 07/11/2017 11:06 AM, mohammad kashif wrote:
>
>> Hi Vijay and Experts
>>
>> I didn't want to experiment with my production setup so started a
>> parallel system with two server and around 80TB storage. First
>> configured with gluster 3.8 and had the same lookup performance issue.
>> Then upgraded to 3.11 as you suggested and it made huge improvement in
>> lookup time. I also did some more optimization as suggested in other
>> threads.
>> Now I am going to update my production server. I am planning to use
>> following optimization option, it would be very useful if you can point
>> out any inconsistency or suggest some other options. My production setup
>> has 5 servers consisting of 400TB storage and around 80 million files
>> of varying lengths.
>>
>> Options Reconfigured:
>> server.event-threads: 4
>> client.event-threads: 4
>> cluster.lookup-optimize: on
>> cluster.readdir-optimize: off
>> performance.client-io-threads: on
>> performance.cache-size: 1GB
>> performance.parallel-readdir: on
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> auth.allow: 163.1.136.*
>> diagnostics.latency-measurement: on
>> diagnostics.count-fop-hits: on
>>
>> I found that setting cluster.readdir-optimize to 'on' made some files
>> disappear from client !
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> On Sun, Jun 18, 2017 at 4:57 PM, Vijay Bellur <vbellur at redhat.com
>> <mailto:vbellur at redhat.com>> wrote:
>>
>> Hi Mohammad,
>>
>> A lot of time is being spent in addressing metadata calls as
>> expected. Can you consider testing out with 3.11 with md-cache [1]
>> and readdirp [2] improvements?
>>
>> Adding Poornima and Raghavendra who worked on these enhancements to
>> help out further.
>>
>> Thanks,
>> Vijay
>>
>> [1] https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/
>> <https://gluster.readthedocs.io/en/latest/release-notes/3.9.0/>
>>
>> [2] https://github.com/gluster/glusterfs/issues/166
>> <https://github.com/gluster/glusterfs/issues/166>
>>
>> On Fri, Jun 16, 2017 at 2:49 PM, mohammad kashif
>> <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>>
>> Hi Vijay
>>
>> Did you manage to look into the gluster profile logs ?
>>
>> Thanks
>>
>> Kashif
>>
>> On Mon, Jun 12, 2017 at 11:40 AM, mohammad kashif
>> <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>> wrote:
>>
>> Hi Vijay
>>
>> I have enabled client profiling and used this script
>> https://github.com/bengland2/gluster-profile-analysis/blob/m
>> aster/gvp-client.sh
>> <https://github.com/bengland2/gluster-profile-analysis/blob/
>> master/gvp-client.sh>
>> to extract data. I am attaching output files. I don't have
>> any reference data to compare with my output. Hopefully you
>> can make some sense out of it.
>>
>> On Sat, Jun 10, 2017 at 10:47 AM, Vijay Bellur
>> <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>>
>> Would it be possible for you to turn on client profiling
>> and then run du? Instructions for turning on client
>> profiling can be found at [1]. Providing the client
>> profile information can help us figure out where the
>> latency could be stemming from.
>>
>> Regards,
>> Vijay
>>
>> [1] https://gluster.readthedocs.io
>> /en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling
>> <https://gluster.readthedocs.i
>> o/en/latest/Administrator%20Guide/Performance%20Testing/#
>> client-side-profiling>
>>
>> On Fri, Jun 9, 2017 at 7:22 PM, mohammad kashif
>> <kashif.alig at gmail.com <mailto:kashif.alig at gmail.com>>
>>
>> wrote:
>>
>> Hi Vijay
>>
>> Thanks for your quick response. I am using gluster
>> 3.8.11 on Centos 7 servers
>> glusterfs-3.8.11-1.el7.x86_64
>>
>> clients are centos 6 but I tested with a centos 7
>> client as well and results didn't change
>>
>> gluster volume info Volume Name: atlasglust
>> Type: Distribute
>> Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 5
>> Transport-type: tcp
>> Bricks:
>> Brick1: pplxgluster01.x.y.z:/glusterat
>> las/brick001/gv0
>> Brick2: pplxgluster02..x.y.z:/glustera
>> tlas/brick002/gv0
>> Brick3: pplxgluster03.x.y.z:/glusterat
>> las/brick003/gv0
>> Brick4: pplxgluster04.x.y.z:/glusterat
>> las/brick004/gv0
>> Brick5: pplxgluster05.x.y.z:/glusterat
>> las/brick005/gv0
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> auth.allow: x.y.z
>>
>> I am not using directory quota.
>>
>> Please let me know if you require some more info
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> On Fri, Jun 9, 2017 at 2:34 PM, Vijay Bellur
>> <vbellur at redhat.com <mailto:vbellur at redhat.com>>
>> wrote:
>>
>> Can you please provide more details about your
>> volume configuration and the version of gluster
>> that you are using?
>>
>> Regards,
>> Vijay
>>
>> On Fri, Jun 9, 2017 at 5:35 PM, mohammad kashif
>> <kashif.alig at gmail.com
>> <mailto:kashif.alig at gmail.com>> wrote:
>>
>> Hi
>>
>> I have just moved our 400 TB HPC storage
>> from lustre to gluster. It is part of a
>> research institute and users have very small
>> files to big files ( few KB to 20GB) . Our
>> setup consists of 5 servers, each with 96TB
>> RAID 6 disks. All servers are connected
>> through 10G ethernet but not all clients.
>> Gluster volumes are distributed without any
>> replication. There are approximately 80
>> million files in file system.
>> I am mounting using glusterfs on clients.
>>
>> I have copied everything from lustre to
>> gluster but old file system exist so I can
>> compare.
>>
>> The problem, I am facing is extremely slow
>> du on even a small directory. Also the time
>> taken is substantially different each time.
>> I tried du from same client on a particular
>> directory twice and got these results.
>>
>> time du -sh /data/aa/bb/cc
>> 3.7G /data/aa/bb/cc
>> real 7m29.243s
>> user 0m1.448s
>> sys 0m7.067s
>>
>> time du -sh /data/aa/bb/cc
>> 3.7G /data/aa/bb/cc
>> real 16m43.735s
>> user 0m1.097s
>> sys 0m5.802s
>>
>> 16m and 7m is too long for a 3.7 G
>> directory. I must mention that the directory
>> contains huge number of files (208736)
>>
>> but running du on same directory on old data
>> gives this result
>>
>> time du -sh /olddata/aa/bb/cc
>> 4.0G /olddata/aa/bb/cc
>> real 3m1.255s
>> user 0m0.755s
>> sys 0m38.099s
>>
>> much better if I run same command again
>>
>> time du -sh /olddata/aa/bb/cc
>> 4.0G /olddata/aa/bb/cc
>> real 0m8.309s
>> user 0m0.313s
>> sys 0m7.755s
>>
>> Is there anything I can do to improve this
>> performance? I would also like hear from
>> some one who is running same kind of setup.
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>> ______________________________
>> _________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailm
>> an/listinfo/gluster-users
>> <http://lists.gluster.org/mail
>> man/listinfo/gluster-users>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170712/0c654b09/attachment.html>
More information about the Gluster-users
mailing list