[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Wed Apr 18 04:54:44 UTC 2018

Just saw a recently posted issue by Serkan Çoban that looks very similar:
http://lists.gluster.org/pipermail/gluster-users/2018-April/033915.html

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Tue, Apr 17, 2018 at 9:44 PM, Artem Russakovskii <archon810 at gmail.com>
wrote:

> Following up here on a related and very serious for us issue.
>
> I took down one of the 4 replicate gluster servers for maintenance today.
> There are 2 gluster volumes totaling about 600GB. Not that much data. After
> the server comes back online, it starts auto healing and pretty much all
> operations on gluster freeze for many minutes.
>
> For example, I was trying to run an ls -alrt in a folder with 7300 files,
> and it took a good 15-20 minutes before returning.
>
> During this time, I can see iostat show 100% utilization on the brick,
> heal status takes many minutes to return, glusterfsd uses up tons of CPU (I
> saw it spike to 600%). gluster already has massive performance issues for
> me, but healing after a 4-hour downtime is on another level of bad perf.
>
> For example, this command took many minutes to run:
>
> gluster volume heal androidpolice_data3 info summary
> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 91
> Number of entries in heal pending: 90
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick forge:/mnt/forge_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 87
> Number of entries in heal pending: 86
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick hive:/mnt/hive_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 87
> Number of entries in heal pending: 86
> Number of entries in split-brain: 0
> Number of entries possibly healing: 1
>
> Brick citadel:/mnt/citadel_block4/androidpolice_data3
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
>
> Statistics showed a diminishing number of failed heals:
> ...
> Ending time of crawl: Tue Apr 17 21:13:08 2018
>
> Type of crawl: INDEX
> No. of entries healed: 2
> No. of entries in split-brain: 0
> No. of heal failed entries: 102
>
> Starting time of crawl: Tue Apr 17 21:13:09 2018
>
> Ending time of crawl: Tue Apr 17 21:14:30 2018
>
> Type of crawl: INDEX
> No. of entries healed: 4
> No. of entries in split-brain: 0
> No. of heal failed entries: 91
>
> Starting time of crawl: Tue Apr 17 21:14:31 2018
>
> Ending time of crawl: Tue Apr 17 21:15:34 2018
>
> Type of crawl: INDEX
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 88
> ...
>
> Eventually, everything heals and goes back to at least where the roof
> isn't on fire anymore.
>
> The server stats and volume options were given in one of the previous
> replies to this thread.
>
> Any ideas or things I could run and show the output of to help diagnose?
> I'm also very open to working with someone on the team on a live debugging
> session if there's interest.
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> Hi Vlad,
>>
>> I actually saw that post already and even asked a question 4 days ago (
>> https://serverfault.com/questions/517775/glusterfs-direct-
>> i-o-mode#comment1172497_540917). The accepted answer also seems to go
>> against your suggestion to enable direct-io-mode as it says it should be
>> disabled for better performance when used just for file accesses.
>>
>> It'd be great if someone from the Gluster team chimed in about this
>> thread.
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>> On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov <vladkopy at gmail.com> wrote:
>>
>>> Wish I knew or was able to get detailed description of those options
>>> myself.
>>> here is direct-io-mode  https://serverfault.com/questi
>>> ons/517775/glusterfs-direct-i-o-mode
>>> Same as you I ran tests on a large volume of files, finding that main
>>> delays are in attribute calls, ending up with those mount options to add
>>> performance.
>>> I discovered those options through basically googling this user list
>>> with people sharing their tests.
>>> Not sure I would share your optimism, and rather then going up I
>>> downgraded to 3.12 and have no dir view issue now. Though I had to recreate
>>> the cluster and had to re-add bricks with existing data.
>>>
>>> On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii <archon810 at gmail.com
>>> > wrote:
>>>
>>>> Hi Vlad,
>>>>
>>>> I'm using only localhost: mounts.
>>>>
>>>> Can you please explain what effect each option has on performance
>>>> issues shown in my posts? "negative-timeout=10,attribute
>>>> -timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
>>>> From what I remember, direct-io-mode=enable didn't make a difference in my
>>>> tests, but I suppose I can try again. The explanations about direct-io-mode
>>>> are quite confusing on the web in various guides, saying enabling it could
>>>> make performance worse in some situations and better in others due to OS
>>>> file cache.
>>>>
>>>> There are also these gluster volume settings, adding to the confusion:
>>>> Option: performance.strict-o-direct
>>>> Default Value: off
>>>> Description: This option when set to off, ignores the O_DIRECT flag.
>>>>
>>>> Option: performance.nfs.strict-o-direct
>>>> Default Value: off
>>>> Description: This option when set to off, ignores the O_DIRECT flag.
>>>>
>>>> Re: 4.0. I moved to 4.0 after finding out that it fixes the
>>>> disappearing dirs bug related to cluster.readdir-optimize if you remember (
>>>> http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html).
>>>> I was already on 3.13 by then, and 4.0 resolved the issue. It's been stable
>>>> for me so far, thankfully.
>>>>
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>> On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov <vladkopy at gmail.com>
>>>> wrote:
>>>>
>>>>> you definitely need mount options to /etc/fstab
>>>>> use ones from here http://lists.gluster.org/piper
>>>>> mail/gluster-users/2018-April/033811.html
>>>>>
>>>>> I went on with using local mounts to achieve performance as well
>>>>>
>>>>> Also, 3.12 or 3.10 branches would be preferable for production
>>>>>
>>>>> On Fri, Apr 6, 2018 at 4:12 AM, Artem Russakovskii <
>>>>> archon810 at gmail.com> wrote:
>>>>>
>>>>>> Hi again,
>>>>>>
>>>>>> I'd like to expand on the performance issues and plead for help.
>>>>>> Here's one case which shows these odd hiccups: https://i.imgur.com/C
>>>>>> XBPjTK.gifv.
>>>>>>
>>>>>> In this GIF where I switch back and forth between copy operations on
>>>>>> 2 servers, I'm copying a 10GB dir full of .apk and image files.
>>>>>>
>>>>>> On server "hive" I'm copying straight from the main disk to an
>>>>>> attached volume block (xfs). As you can see, the transfers are relatively
>>>>>> speedy and don't hiccup.
>>>>>> On server "citadel" I'm copying the same set of data to a 4-replicate
>>>>>> gluster which uses block storage as a brick. As you can see, performance is
>>>>>> much worse, and there are frequent pauses for many seconds where nothing
>>>>>> seems to be happening - just freezes.
>>>>>>
>>>>>> All 4 servers have the same specs, and all of them have performance
>>>>>> issues with gluster and no such issues when raw xfs block storage is used.
>>>>>>
>>>>>> hive has long finished copying the data, while citadel is barely
>>>>>> chugging along and is expected to take probably half an hour to an hour. I
>>>>>> have over 1TB of data to migrate, at which point if we went live, I'm not
>>>>>> even sure gluster would be able to keep up instead of bringing the machines
>>>>>> and services down.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here's the cluster config, though it didn't seem to make any
>>>>>> difference performance-wise before I applied the customizations vs after.
>>>>>>
>>>>>> Volume Name: apkmirror_data1
>>>>>> Type: Replicate
>>>>>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
>>>>>> Brick2: forge:/mnt/forge_block1/apkmirror_data1
>>>>>> Brick3: hive:/mnt/hive_block1/apkmirror_data1
>>>>>> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1
>>>>>> Options Reconfigured:
>>>>>> cluster.quorum-count: 1
>>>>>> cluster.quorum-type: fixed
>>>>>> network.ping-timeout: 5
>>>>>> network.remote-dio: enable
>>>>>> performance.rda-cache-limit: 256MB
>>>>>> performance.readdir-ahead: on
>>>>>> performance.parallel-readdir: on
>>>>>> network.inode-lru-limit: 500000
>>>>>> performance.md-cache-timeout: 600
>>>>>> performance.cache-invalidation: on
>>>>>> performance.stat-prefetch: on
>>>>>> features.cache-invalidation-timeout: 600
>>>>>> features.cache-invalidation: on
>>>>>> cluster.readdir-optimize: on
>>>>>> performance.io-thread-count: 32
>>>>>> server.event-threads: 4
>>>>>> client.event-threads: 4
>>>>>> performance.read-ahead: off
>>>>>> cluster.lookup-optimize: on
>>>>>> performance.cache-size: 1GB
>>>>>> cluster.self-heal-daemon: enable
>>>>>> transport.address-family: inet
>>>>>> nfs.disable: on
>>>>>> performance.client-io-threads: on
>>>>>>
>>>>>>
>>>>>> The mounts are done as follows in /etc/fstab:
>>>>>> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
>>>>>> /mnt/citadel_block1 xfs defaults 0 2
>>>>>> localhost:/apkmirror_data1 /mnt/apkmirror_data1 glusterfs
>>>>>> defaults,_netdev 0 0
>>>>>>
>>>>>> I'm really not sure if direct-io-mode mount tweaks would do anything
>>>>>> here, what the value should be set to, and what it is by default.
>>>>>>
>>>>>> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20 CPUs, hosted by
>>>>>> Linode.
>>>>>>
>>>>>> I'd really appreciate any help in the matter.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>> On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm trying to squeeze performance out of gluster on 4 80GB RAM
>>>>>>> 20-CPU machines where Gluster runs on attached block storage (Linode) in (4
>>>>>>> replicate bricks), and so far everything I tried results in sub-optimal
>>>>>>> performance.
>>>>>>>
>>>>>>> There are many files - mostly images, several million - and many
>>>>>>> operations take minutes, copying multiple files (even if they're small)
>>>>>>> suddenly freezes up for seconds at a time, then continues, iostat
>>>>>>> frequently shows large r_await and w_awaits with 100% utilization for the
>>>>>>> attached block device, etc.
>>>>>>>
>>>>>>> But anyway, there are many guides out there for small-file
>>>>>>> performance improvements, but more explanation is needed, and I think more
>>>>>>> tweaks should be possible.
>>>>>>>
>>>>>>> My question today is about performance.cache-size. Is this a size of
>>>>>>> cache in RAM? If so, how do I view the current cache size to see if it gets
>>>>>>> full and I should increase its size? Is it advisable to bump it up if I
>>>>>>> have many tens of gigs of RAM free?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> More generally, in the last 2 months since I first started working
>>>>>>> with gluster and set a production system live, I've been feeling frustrated
>>>>>>> because Gluster has a lot of poorly-documented and confusing options. I
>>>>>>> really wish documentation could be improved with examples and better
>>>>>>> explanations.
>>>>>>>
>>>>>>> Specifically, it'd be absolutely amazing if the docs offered a
>>>>>>> strategy for setting each value and ways of determining more optimal
>>>>>>> values. For example, for performance.cache-size, if it said something like
>>>>>>> "run command abc to see your current cache size, and if it's hurting, up
>>>>>>> it, but be aware that it's limited by RAM," it'd be already a huge
>>>>>>> improvement to the docs. And so on with other options.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The gluster team is quite helpful on this mailing list, but in a
>>>>>>> reactive rather than proactive way. Perhaps it's tunnel vision once you've
>>>>>>> worked on a project for so long where less technical explanations and even
>>>>>>> proper documentation of options takes a back seat, but I encourage you to
>>>>>>> be more proactive about helping us understand and optimize Gluster.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180417/956a8947/attachment.html>