[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Ravishankar N ravishankar at redhat.com
Wed Apr 18 06:16:52 UTC 2018



On 04/18/2018 10:35 AM, Artem Russakovskii wrote:
> Hi Ravi,
>
> Could you please expand on how these would help?
>
> By forcing full here, we move the logic from the CPU to network, thus 
> decreasing CPU utilization, is that right?
Yes, 'diff' employs the rchecksum FOP which does a sha256  checksum 
which can consume CPU. So yes it is sort of shifting the load from CPU 
to the network. But if your average file size is small, it would make 
sense to copy the entire file instead of computing checksums.

> This is assuming the CPU and disk utilization are caused by the differ 
> and not by lstat and other calls or something.
>
>     Option: cluster.data-self-heal-algorithm
>     Default Value: (null)
>     Description: Select between "full", "diff". The "full" algorithm
>     copies the entire file from source to sink. The "diff" algorithm
>     copies to sink only those blocks whose checksums don't match with
>     those of source. If no option is configured the option is chosen
>     dynamically as follows: If the file does not exist on one of the
>     sinks or empty file exists or if the source file size is about the
>     same as page size the entire file will be read and written i.e
>     "full" algo, otherwise "diff" algo is chosen.
>
>
> I really have no idea what this means and how/why it would help. Any 
> more info on this option?

https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/granular-entry-self-healing.md 
should help.
Regards,
Ravi

>     Option: cluster.granular-entry-heal
>     Default Value: no
>     Description: If this option is enabled, self-heal will resort to
>     granular way of recording changelogs and doing entry self-heal.
>
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror 
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> | +ArtemRussakovskii 
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR 
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 17, 2018 at 9:58 PM, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>     On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
>>     Following up here on a related and very serious for us issue.
>>
>>     I took down one of the 4 replicate gluster servers for
>>     maintenance today. There are 2 gluster volumes totaling about
>>     600GB. Not that much data. After the server comes back online, it
>>     starts auto healing and pretty much all operations on gluster
>>     freeze for many minutes.
>>
>>     For example, I was trying to run an ls -alrt in a folder with
>>     7300 files, and it took a good 15-20 minutes before returning.
>>
>>     During this time, I can see iostat show 100% utilization on the
>>     brick, heal status takes many minutes to return, glusterfsd uses
>>     up tons of CPU (I saw it spike to 600%). gluster already has
>>     massive performance issues for me, but healing after a 4-hour
>>     downtime is on another level of bad perf.
>>
>>     For example, this command took many minutes to run:
>>
>>     gluster volume heal androidpolice_data3 info summary
>>     Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
>>     Status: Connected
>>     Total Number of entries: 91
>>     Number of entries in heal pending: 90
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 1
>>
>>     Brick forge:/mnt/forge_block4/androidpolice_data3
>>     Status: Connected
>>     Total Number of entries: 87
>>     Number of entries in heal pending: 86
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 1
>>
>>     Brick hive:/mnt/hive_block4/androidpolice_data3
>>     Status: Connected
>>     Total Number of entries: 87
>>     Number of entries in heal pending: 86
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 1
>>
>>     Brick citadel:/mnt/citadel_block4/androidpolice_data3
>>     Status: Connected
>>     Total Number of entries: 0
>>     Number of entries in heal pending: 0
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>
>>     Statistics showed a diminishing number of failed heals:
>>     ...
>>     Ending time of crawl: Tue Apr 17 21:13:08 2018
>>
>>     Type of crawl: INDEX
>>     No. of entries healed: 2
>>     No. of entries in split-brain: 0
>>     No. of heal failed entries: 102
>>
>>     Starting time of crawl: Tue Apr 17 21:13:09 2018
>>
>>     Ending time of crawl: Tue Apr 17 21:14:30 2018
>>
>>     Type of crawl: INDEX
>>     No. of entries healed: 4
>>     No. of entries in split-brain: 0
>>     No. of heal failed entries: 91
>>
>>     Starting time of crawl: Tue Apr 17 21:14:31 2018
>>
>>     Ending time of crawl: Tue Apr 17 21:15:34 2018
>>
>>     Type of crawl: INDEX
>>     No. of entries healed: 0
>>     No. of entries in split-brain: 0
>>     No. of heal failed entries: 88
>>     ...
>>
>>     Eventually, everything heals and goes back to at least where the
>>     roof isn't on fire anymore.
>>
>>     The server stats and volume options were given in one of the
>>     previous replies to this thread.
>>
>>     Any ideas or things I could run and show the output of to help
>>     diagnose? I'm also very open to working with someone on the team
>>     on a live debugging session if there's interest.
>
>     It is likely that self-heal is causing the CPU spike due to the
>     flood of lookups/ locks and checksum fops that the
>     self-heal-daemon sends to the bricks.
>     There's a script to control shd's cpu usage using cgroups. That
>     should help in regulating self-heal traffic:
>     https://review.gluster.org/#/c/18404/
>     <https://review.gluster.org/#/c/18404/> (see
>     extras/control-cpu-load.sh)
>     Other self-heal related volume options that you could change are
>     setting 'cluster.data-self-heal-algorithm' to 'full' and
>     'granular-entry-heal' to 'enable'.  `gluster volume set help`
>     should give you more information about these options.
>     Thanks,
>     Ravi
>
>
>>
>>     Thank you.
>>
>>
>>     Sincerely,
>>     Artem
>>
>>     --
>>     Founder, Android Police <http://www.androidpolice.com>, APK
>>     Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>     beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>>     <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>     <http://twitter.com/ArtemR>
>>
>>     On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
>>     <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>>
>>         Hi Vlad,
>>
>>         I actually saw that post already and even asked a question 4
>>         days ago
>>         (https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917
>>         <https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917>).
>>         The accepted answer also seems to go against your suggestion
>>         to enable direct-io-mode as it says it should be disabled for
>>         better performance when used just for file accesses.
>>
>>         It'd be great if someone from the Gluster team chimed in
>>         about this thread.
>>
>>
>>         Sincerely,
>>         Artem
>>
>>         --
>>         Founder, Android Police <http://www.androidpolice.com>, APK
>>         Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>         beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>>         <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>         <http://twitter.com/ArtemR>
>>
>>         On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov
>>         <vladkopy at gmail.com <mailto:vladkopy at gmail.com>> wrote:
>>
>>             Wish I knew or was able to get detailed description of
>>             those options myself.
>>             here is direct-io-mode
>>             https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode
>>             <https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode>
>>             Same as you I ran tests on a large volume of files,
>>             finding that main delays are in attribute calls, ending
>>             up with those mount options to add performance.
>>             I discovered those options through basically googling
>>             this user list with people sharing their tests.
>>             Not sure I would share your optimism, and rather then
>>             going up I downgraded to 3.12 and have no dir view issue
>>             now. Though I had to recreate the cluster and had to
>>             re-add bricks with existing data.
>>
>>             On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii
>>             <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>>
>>                 Hi Vlad,
>>
>>                 I'm using only localhost: mounts.
>>
>>                 Can you please explain what effect each option has on
>>                 performance issues shown in my posts?
>>                 "negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
>>                 From what I remember, direct-io-mode=enable didn't
>>                 make a difference in my tests, but I suppose I can
>>                 try again. The explanations about direct-io-mode are
>>                 quite confusing on the web in various guides, saying
>>                 enabling it could make performance worse in some
>>                 situations and better in others due to OS file cache.
>>
>>                 There are also these gluster volume settings, adding
>>                 to the confusion:
>>                 Option: performance.strict-o-direct
>>                 Default Value: off
>>                 Description: This option when set to off, ignores the
>>                 O_DIRECT flag.
>>
>>                 Option: performance.nfs.strict-o-direct
>>                 Default Value: off
>>                 Description: This option when set to off, ignores the
>>                 O_DIRECT flag.
>>
>>                 Re: 4.0. I moved to 4.0 after finding out that it
>>                 fixes the disappearing dirs bug related to
>>                 cluster.readdir-optimize if you remember
>>                 (http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html
>>                 <http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html>).
>>                 I was already on 3.13 by then, and 4.0 resolved the
>>                 issue. It's been stable for me so far, thankfully.
>>
>>
>>                 Sincerely,
>>                 Artem
>>
>>                 --
>>                 Founder, Android Police
>>                 <http://www.androidpolice.com>, APK Mirror
>>                 <http://www.apkmirror.com/>, Illogical Robot LLC
>>                 beerpla.net <http://beerpla.net/> |
>>                 +ArtemRussakovskii
>>                 <https://plus.google.com/+ArtemRussakovskii> |
>>                 @ArtemR <http://twitter.com/ArtemR>
>>
>>                 On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov
>>                 <vladkopy at gmail.com <mailto:vladkopy at gmail.com>> wrote:
>>
>>                     you definitely need mount options to /etc/fstab
>>                     use ones from here
>>                     http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
>>                     <http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
>>
>>                     I went on with using local mounts to achieve
>>                     performance as well
>>
>>                     Also, 3.12 or 3.10 branches would be preferable
>>                     for production
>>
>>                     On Fri, Apr 6, 2018 at 4:12 AM, Artem
>>                     Russakovskii <archon810 at gmail.com
>>                     <mailto:archon810 at gmail.com>> wrote:
>>
>>                         Hi again,
>>
>>                         I'd like to expand on the performance issues
>>                         and plead for help. Here's one case which
>>                         shows these odd hiccups:
>>                         https://i.imgur.com/CXBPjTK.gifv
>>                         <https://i.imgur.com/CXBPjTK.gifv>.
>>
>>                         In this GIF where I switch back and forth
>>                         between copy operations on 2 servers, I'm
>>                         copying a 10GB dir full of .apk and image files.
>>
>>                         On server "hive" I'm copying straight from
>>                         the main disk to an attached volume block
>>                         (xfs). As you can see, the transfers are
>>                         relatively speedy and don't hiccup.
>>                         On server "citadel" I'm copying the same set
>>                         of data to a 4-replicate gluster which uses
>>                         block storage as a brick. As you can see,
>>                         performance is much worse, and there are
>>                         frequent pauses for many seconds where
>>                         nothing seems to be happening - just freezes.
>>
>>                         All 4 servers have the same specs, and all of
>>                         them have performance issues with gluster and
>>                         no such issues when raw xfs block storage is
>>                         used.
>>
>>                         hive has long finished copying the data,
>>                         while citadel is barely chugging along and is
>>                         expected to take probably half an hour to an
>>                         hour. I have over 1TB of data to migrate, at
>>                         which point if we went live, I'm not even
>>                         sure gluster would be able to keep up instead
>>                         of bringing the machines and services down.
>>
>>
>>
>>                         Here's the cluster config, though it didn't
>>                         seem to make any difference performance-wise
>>                         before I applied the customizations vs after.
>>
>>                         Volume Name: apkmirror_data1
>>                         Type: Replicate
>>                         Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
>>                         Status: Started
>>                         Snapshot Count: 0
>>                         Number of Bricks: 1 x 4 = 4
>>                         Transport-type: tcp
>>                         Bricks:
>>                         Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
>>                         Brick2: forge:/mnt/forge_block1/apkmirror_data1
>>                         Brick3: hive:/mnt/hive_block1/apkmirror_data1
>>                         Brick4:
>>                         citadel:/mnt/citadel_block1/apkmirror_data1
>>                         Options Reconfigured:
>>                         cluster.quorum-count: 1
>>                         cluster.quorum-type: fixed
>>                         network.ping-timeout: 5
>>                         network.remote-dio: enable
>>                         performance.rda-cache-limit: 256MB
>>                         performance.readdir-ahead: on
>>                         performance.parallel-readdir: on
>>                         network.inode-lru-limit: 500000
>>                         performance.md-cache-timeout: 600
>>                         performance.cache-invalidation: on
>>                         performance.stat-prefetch: on
>>                         features.cache-invalidation-timeout: 600
>>                         features.cache-invalidation: on
>>                         cluster.readdir-optimize: on
>>                         performance.io-thread-count: 32
>>                         server.event-threads: 4
>>                         client.event-threads: 4
>>                         performance.read-ahead: off
>>                         cluster.lookup-optimize: on
>>                         performance.cache-size: 1GB
>>                         cluster.self-heal-daemon: enable
>>                         transport.address-family: inet
>>                         nfs.disable: on
>>                         performance.client-io-threads: on
>>
>>
>>                         The mounts are done as follows in /etc/fstab:
>>                         /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
>>                         /mnt/citadel_block1 xfs defaults 0 2
>>                         localhost:/apkmirror_data1
>>                         /mnt/apkmirror_data1 glusterfs
>>                         defaults,_netdev 0 0
>>
>>                         I'm really not sure if direct-io-mode mount
>>                         tweaks would do anything here, what the value
>>                         should be set to, and what it is by default.
>>
>>                         The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM,
>>                         20 CPUs, hosted by Linode.
>>
>>                         I'd really appreciate any help in the matter.
>>
>>                         Thank you.
>>
>>
>>                         Sincerely,
>>                         Artem
>>
>>                         --
>>                         Founder, Android Police
>>                         <http://www.androidpolice.com>, APK Mirror
>>                         <http://www.apkmirror.com/>, Illogical Robot LLC
>>                         beerpla.net <http://beerpla.net/> |
>>                         +ArtemRussakovskii
>>                         <https://plus.google.com/+ArtemRussakovskii>
>>                         | @ArtemR <http://twitter.com/ArtemR>
>>
>>                         On Thu, Apr 5, 2018 at 11:13 PM, Artem
>>                         Russakovskii <archon810 at gmail.com
>>                         <mailto:archon810 at gmail.com>> wrote:
>>
>>                             Hi,
>>
>>                             I'm trying to squeeze performance out of
>>                             gluster on 4 80GB RAM 20-CPU machines
>>                             where Gluster runs on attached block
>>                             storage (Linode) in (4 replicate bricks),
>>                             and so far everything I tried results in
>>                             sub-optimal performance.
>>
>>                             There are many files - mostly images,
>>                             several million - and many operations
>>                             take minutes, copying multiple files
>>                             (even if they're small) suddenly freezes
>>                             up for seconds at a time, then continues,
>>                             iostat frequently shows large r_await and
>>                             w_awaits with 100% utilization for the
>>                             attached block device, etc.
>>
>>                             But anyway, there are many guides out
>>                             there for small-file performance
>>                             improvements, but more explanation is
>>                             needed, and I think more tweaks should be
>>                             possible.
>>
>>                             My question today is
>>                             about performance.cache-size. Is this a
>>                             size of cache in RAM? If so, how do I
>>                             view the current cache size to see if it
>>                             gets full and I should increase its size?
>>                             Is it advisable to bump it up if I have
>>                             many tens of gigs of RAM free?
>>
>>
>>
>>                             More generally, in the last 2 months
>>                             since I first started working with
>>                             gluster and set a production system live,
>>                             I've been feeling frustrated because
>>                             Gluster has a lot of poorly-documented
>>                             and confusing options. I really wish
>>                             documentation could be improved with
>>                             examples and better explanations.
>>
>>                             Specifically, it'd be absolutely amazing
>>                             if the docs offered a strategy for
>>                             setting each value and ways of
>>                             determining more optimal values. For
>>                             example, for performance.cache-size, if
>>                             it said something like "run command abc
>>                             to see your current cache size, and if
>>                             it's hurting, up it, but be aware that
>>                             it's limited by RAM," it'd be already a
>>                             huge improvement to the docs. And so on
>>                             with other options.
>>
>>
>>
>>                             The gluster team is quite helpful on this
>>                             mailing list, but in a reactive rather
>>                             than proactive way. Perhaps it's tunnel
>>                             vision once you've worked on a project
>>                             for so long where less technical
>>                             explanations and even proper
>>                             documentation of options takes a back
>>                             seat, but I encourage you to be more
>>                             proactive about helping us understand and
>>                             optimize Gluster.
>>
>>                             Thank you.
>>
>>                             Sincerely,
>>                             Artem
>>
>>                             --
>>                             Founder, Android Police
>>                             <http://www.androidpolice.com>, APK
>>                             Mirror <http://www.apkmirror.com/>,
>>                             Illogical Robot LLC
>>                             beerpla.net <http://beerpla.net/> |
>>                             +ArtemRussakovskii
>>                             <https://plus.google.com/+ArtemRussakovskii>
>>                             | @ArtemR <http://twitter.com/ArtemR>
>>
>>
>>
>>                         _______________________________________________
>>                         Gluster-users mailing list
>>                         Gluster-users at gluster.org
>>                         <mailto:Gluster-users at gluster.org>
>>                         http://lists.gluster.org/mailman/listinfo/gluster-users
>>                         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180418/c3335982/attachment.html>


More information about the Gluster-users mailing list