[Gluster-users] Tiered volume performance degrades badly after a volume stop/start or system restart.

Thu Feb 1 22:35:23 UTC 2018

The problem was simple, the sqlite3 DB connection parameters
were only being set on a newly created DB, not when there was
an existing DB. Apparently the sqlite3 DB default parameters
are not ideal. Patch is in the Bug: 1540376

On Thu, Feb 1, 2018 at 9:32 AM, Jeff Byers <jbyers.sfly at gmail.com> wrote:
> This problem appears to be related to the sqlite3 DB files
> that are used for the tiering file access counters, stored on
> each hot and cold tier brick in .glusterfs/<volname>.db.
>
> When the tier is first created, these DB files do not exist,
> they are created, and everything works fine.
>
> On a stop/start or service restart, the .db files are already
> present, albeit empty since I don't have cluster.write-freq-
> threshold nor cluster.read-freq-threshold set, so
> features.record-counters is off and nothing should be going
> into the DB.
>
> I've found that if I delete these .db files after the volume
> stop, but before the volume start, the tiering performance is
> normal, not degraded. Of course all of the history in these DB
> files is lost. Not sure what other ramifications there are to
> deleting these .db files.
>
> When I did have one of the freq-threshold settings set, I did
> see a record get added to the file, so the sqlite3 DB is
> working to some degree.
>
> The sqlite3 version I have installed is sqlite-3.6.20-
> 1.el6_7.2.x86_64.
>
> On Tue, Jan 30, 2018 at 10:17 PM, Vlad Kopylov <vladkopy at gmail.com> wrote:
>> Tested it in two different environments lately with exactly same results.
>> Was trying to get better read performance from local mounts with
>> hundreds of thousands maildir email files by using SSD,
>>    hoping that .gluster file stat read will improve which does migrate
>> to hot tire.
>> After seeing what you described for 24 hours and confirming all move
>> around on the tires is done - killed it.
>> Here are my volume settings - maybe will be useful to spot conflicting ones.
>>
>> cluster.shd-max-threads: 12
>> performance.rda-cache-limit: 128MB
>> cluster.readdir-optimize: on
>> cluster.read-hash-mode: 0
>> performance.strict-o-direct: on
>> cluster.lookup-unhashed: auto
>> performance.nl-cache: on
>> performance.nl-cache-timeout: 600
>> cluster.lookup-optimize: on
>> client.event-threads: 8
>> performance.client-io-threads: on
>> performance.md-cache-timeout: 600
>> server.event-threads: 8
>> features.cache-invalidation: on
>> features.cache-invalidation-timeout: 600
>> performance.stat-prefetch: on
>> performance.cache-invalidation: on
>> network.inode-lru-limit: 90000
>> performance.cache-refresh-timeout: 10
>> performance.enable-least-priority: off
>> performance.cache-size: 2GB
>> cluster.nufa: on
>> cluster.choose-local: on
>> server.outstanding-rpc-limit: 128
>>
>> fuse mounting defaults,_netdev,negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5
>>
>> On Tue, Jan 30, 2018 at 6:29 PM, Jeff Byers <jbyers.sfly at gmail.com> wrote:
>>> I am fighting this issue:
>>>
>>>   Bug 1540376 – Tiered volume performance degrades badly after a
>>> volume stop/start or system restart.
>>>   https://bugzilla.redhat.com/show_bug.cgi?id=1540376
>>>
>>> Does anyone have any ideas on what might be causing this, and
>>> what a fix or work-around might be?
>>>
>>> Thanks!
>>>
>>> ~ Jeff Byers ~
>>>
>>> Tiered volume performance degrades badly after a volume
>>> stop/start or system restart.
>>>
>>> The degradation is very significant, making the performance of
>>> an SSD hot tiered volume a fraction of what it was with the
>>> HDD before tiering.
>>>
>>> Stopping and starting the tiered volume causes the problem to
>>> exhibit. Stopping and starting the Gluster services also does.
>>>
>>> Nothing in the tier is being promoted or demoted, the volume
>>> starts empty, a file is written, then read, then deleted. The
>>> file(s) only ever exist on the hot tier.
>>>
>>> This affects GlusterFS FUSE mounts, and also NFSv3 NFS mounts.
>>> The problem has been reproduced in two test lab environments.
>>> The issue was first seen using GlusterFS 3.7.18, and retested
>>> with the same result using GlusterFS 3.12.3.
>>>
>>> I'm using the default tiering settings, no adjustments.
>>>
>>> Nothing of any significance appears to be being reported in
>>> the GlusterFS logs.
>>>
>>> Summary:
>>>
>>> Before SSD tiering, HDD performance on a FUSE mount was 130.87
>>> MB/sec writes, 128.53 MB/sec reads.
>>>
>>> After SSD tiering, performance on a FUSE mount was 199.99
>>> MB/sec writes, 257.28 MB/sec reads.
>>>
>>> After GlusterFS volume stop/start, SSD tiering performance on
>>> FUSE mount was 35.81 MB/sec writes, 37.33 MB/sec reads. A very
>>> significant reduction in performance.
>>>
>>> Detaching and reattaching the SSD tier restores the good
>>> tiered performance.
>>>
>>> ~ Jeff Byers ~
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> ~ Jeff Byers ~

-- 
~ Jeff Byers ~