[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs
Ravishankar N
ravishankar at redhat.com
Wed Apr 18 04:58:12 UTC 2018
On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
> Following up here on a related and very serious for us issue.
>
> I took down one of the 4 replicate gluster servers for maintenance
> today. There are 2 gluster volumes totaling about 600GB. Not that much
> data. After the server comes back online, it starts auto healing and
> pretty much all operations on gluster freeze for many minutes.
>
> For example, I was trying to run an ls -alrt in a folder with 7300
> files, and it took a good 15-20 minutes before returning.
>
> During this time, I can see iostat show 100% utilization on the brick,
> heal status takes many minutes to return, glusterfsd uses up tons of
> CPU (I saw it spike to 600%). gluster already has massive performance
> issues for me, but healing after a 4-hour downtime is on another level
> of bad perf.
>
> For example, this command took many minutes to run:
>
> gluster volume heal androidpolice_data3 info summary
> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 91
> Number of entries in heal pending: 90
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick forge:/mnt/forge_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 87
> Number of entries in heal pending: 86
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick hive:/mnt/hive_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 87
> Number of entries in heal pending: 86
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick citadel:/mnt/citadel_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
>
> Statistics showed a diminishing number of failed heals:
> ...
> Ending time of crawl: Tue Apr 17 21:13:08 2018
>
> Type of crawl: INDEX
> No. of entries healed: 2
> No. of entries in split-brain: 0
> No. of heal failed entries: 102
>
> Starting time of crawl: Tue Apr 17 21:13:09 2018
>
> Ending time of crawl: Tue Apr 17 21:14:30 2018
>
> Type of crawl: INDEX
> No. of entries healed: 4
> No. of entries in split-brain: 0
> No. of heal failed entries: 91
>
> Starting time of crawl: Tue Apr 17 21:14:31 2018
>
> Ending time of crawl: Tue Apr 17 21:15:34 2018
>
> Type of crawl: INDEX
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 88
> ...
>
> Eventually, everything heals and goes back to at least where the roof
> isn't on fire anymore.
>
> The server stats and volume options were given in one of the previous
> replies to this thread.
>
> Any ideas or things I could run and show the output of to help
> diagnose? I'm also very open to working with someone on the team on a
> live debugging session if there's interest.
It is likely that self-heal is causing the CPU spike due to the flood of
lookups/ locks and checksum fops that the self-heal-daemon sends to the
bricks.
There's a script to control shd's cpu usage using cgroups. That should
help in regulating self-heal traffic:
https://review.gluster.org/#/c/18404/ (see extras/control-cpu-load.sh)
Other self-heal related volume options that you could change are setting
'cluster.data-self-heal-algorithm' to 'full' and 'granular-entry-heal'
to 'enable'. `gluster volume set help` should give you more information
about these options.
Thanks,
Ravi
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
> <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>
> Hi Vlad,
>
> I actually saw that post already and even asked a question 4 days
> ago
> (https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917
> <https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917>).
> The accepted answer also seems to go against your suggestion to
> enable direct-io-mode as it says it should be disabled for better
> performance when used just for file accesses.
>
> It'd be great if someone from the Gluster team chimed in about
> this thread.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov <vladkopy at gmail.com
> <mailto:vladkopy at gmail.com>> wrote:
>
> Wish I knew or was able to get detailed description of those
> options myself.
> here is direct-io-mode
> https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode
> <https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode>
> Same as you I ran tests on a large volume of files, finding
> that main delays are in attribute calls, ending up with those
> mount options to add performance.
> I discovered those options through basically googling this
> user list with people sharing their tests.
> Not sure I would share your optimism, and rather then going up
> I downgraded to 3.12 and have no dir view issue now. Though I
> had to recreate the cluster and had to re-add bricks with
> existing data.
>
> On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii
> <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>
> Hi Vlad,
>
> I'm using only localhost: mounts.
>
> Can you please explain what effect each option has on
> performance issues shown in my posts?
> "negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
> From what I remember, direct-io-mode=enable didn't make a
> difference in my tests, but I suppose I can try again. The
> explanations about direct-io-mode are quite confusing on
> the web in various guides, saying enabling it could make
> performance worse in some situations and better in others
> due to OS file cache.
>
> There are also these gluster volume settings, adding to
> the confusion:
> Option: performance.strict-o-direct
> Default Value: off
> Description: This option when set to off, ignores the
> O_DIRECT flag.
>
> Option: performance.nfs.strict-o-direct
> Default Value: off
> Description: This option when set to off, ignores the
> O_DIRECT flag.
>
> Re: 4.0. I moved to 4.0 after finding out that it fixes
> the disappearing dirs bug related to
> cluster.readdir-optimize if you remember
> (http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html
> <http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html>).
> I was already on 3.13 by then, and 4.0 resolved the issue.
> It's been stable for me so far, thankfully.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>,
> APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov
> <vladkopy at gmail.com <mailto:vladkopy at gmail.com>> wrote:
>
> you definitely need mount options to /etc/fstab
> use ones from here
> http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
> <http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
>
> I went on with using local mounts to achieve
> performance as well
>
> Also, 3.12 or 3.10 branches would be preferable for
> production
>
> On Fri, Apr 6, 2018 at 4:12 AM, Artem Russakovskii
> <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>
> Hi again,
>
> I'd like to expand on the performance issues and
> plead for help. Here's one case which shows these
> odd hiccups: https://i.imgur.com/CXBPjTK.gifv
> <https://i.imgur.com/CXBPjTK.gifv>.
>
> In this GIF where I switch back and forth between
> copy operations on 2 servers, I'm copying a 10GB
> dir full of .apk and image files.
>
> On server "hive" I'm copying straight from the
> main disk to an attached volume block (xfs). As
> you can see, the transfers are relatively speedy
> and don't hiccup.
> On server "citadel" I'm copying the same set of
> data to a 4-replicate gluster which uses block
> storage as a brick. As you can see, performance is
> much worse, and there are frequent pauses for many
> seconds where nothing seems to be happening - just
> freezes.
>
> All 4 servers have the same specs, and all of them
> have performance issues with gluster and no such
> issues when raw xfs block storage is used.
>
> hive has long finished copying the data, while
> citadel is barely chugging along and is expected
> to take probably half an hour to an hour. I have
> over 1TB of data to migrate, at which point if we
> went live, I'm not even sure gluster would be able
> to keep up instead of bringing the machines and
> services down.
>
>
>
> Here's the cluster config, though it didn't seem
> to make any difference performance-wise before I
> applied the customizations vs after.
>
> Volume Name: apkmirror_data1
> Type: Replicate
> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
> Brick2: forge:/mnt/forge_block1/apkmirror_data1
> Brick3: hive:/mnt/hive_block1/apkmirror_data1
> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1
> Options Reconfigured:
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> network.ping-timeout: 5
> network.remote-dio: enable
> performance.rda-cache-limit: 256MB
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> network.inode-lru-limit: 500000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.readdir-optimize: on
> performance.io-thread-count: 32
> server.event-threads: 4
> client.event-threads: 4
> performance.read-ahead: off
> cluster.lookup-optimize: on
> performance.cache-size: 1GB
> cluster.self-heal-daemon: enable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
>
>
> The mounts are done as follows in /etc/fstab:
> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
> /mnt/citadel_block1 xfs defaults 0 2
> localhost:/apkmirror_data1 /mnt/apkmirror_data1
> glusterfs defaults,_netdev 0 0
>
> I'm really not sure if direct-io-mode mount tweaks
> would do anything here, what the value should be
> set to, and what it is by default.
>
> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20
> CPUs, hosted by Linode.
>
> I'd really appreciate any help in the matter.
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police
> <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> |
> +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> |
> @ArtemR <http://twitter.com/ArtemR>
>
> On Thu, Apr 5, 2018 at 11:13 PM, Artem
> Russakovskii <archon810 at gmail.com
> <mailto:archon810 at gmail.com>> wrote:
>
> Hi,
>
> I'm trying to squeeze performance out of
> gluster on 4 80GB RAM 20-CPU machines where
> Gluster runs on attached block storage
> (Linode) in (4 replicate bricks), and so far
> everything I tried results in sub-optimal
> performance.
>
> There are many files - mostly images, several
> million - and many operations take minutes,
> copying multiple files (even if they're small)
> suddenly freezes up for seconds at a time,
> then continues, iostat frequently shows large
> r_await and w_awaits with 100% utilization for
> the attached block device, etc.
>
> But anyway, there are many guides out there
> for small-file performance improvements, but
> more explanation is needed, and I think more
> tweaks should be possible.
>
> My question today is
> about performance.cache-size. Is this a size
> of cache in RAM? If so, how do I view the
> current cache size to see if it gets full and
> I should increase its size? Is it advisable to
> bump it up if I have many tens of gigs of RAM
> free?
>
>
>
> More generally, in the last 2 months since I
> first started working with gluster and set a
> production system live, I've been feeling
> frustrated because Gluster has a lot of
> poorly-documented and confusing options. I
> really wish documentation could be improved
> with examples and better explanations.
>
> Specifically, it'd be absolutely amazing if
> the docs offered a strategy for setting each
> value and ways of determining more optimal
> values. For example,
> for performance.cache-size, if it said
> something like "run command abc to see your
> current cache size, and if it's hurting, up
> it, but be aware that it's limited by RAM,"
> it'd be already a huge improvement to the
> docs. And so on with other options.
>
>
>
> The gluster team is quite helpful on this
> mailing list, but in a reactive rather than
> proactive way. Perhaps it's tunnel vision once
> you've worked on a project for so long where
> less technical explanations and even proper
> documentation of options takes a back seat,
> but I encourage you to be more proactive about
> helping us understand and optimize Gluster.
>
> Thank you.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police
> <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> |
> +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> |
> @ArtemR <http://twitter.com/ArtemR>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180418/76911809/attachment.html>
More information about the Gluster-users
mailing list