[Gluster-users] Blocking IO when hot tier promotion daemon runs

Wed Jan 10 15:33:51 UTC 2018

I should add that additional testing has shown that only accessing files is
held up, IO is not interrupted for existing transfers. I think this points
to the heat metadata in the sqlite DB for the tier, is it possible that a
table is temporarily locked while the promotion daemon runs so the calls to
update the access count on files are blocked?

On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite at gmail.com> wrote:

> The sizes of the files are extremely varied, there are millions of small
> (<1 MB) files and thousands of files larger than 1 GB.
>
> Attached is the tier log for gluster1 and gluster2. These are full of
> "demotion failed" messages, which is also shown in the status:
>
> [root at pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status
> Node                 Promoted files       Demoted files        Status
>          run time in h:m:s
> ---------            ---------            ---------            ---------
>           ---------
> localhost            25940                0                    in
> progress          112:21:49
> pod-sjc1-gluster2 0                    2917154              in progress
>       112:21:49
>
> Is it normal to have promotions and demotions only happen on each server
> but not both?
>
> Volume info:
>
> [root at pod-sjc1-gluster1 ~]# gluster volume info
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
> Status: Started
> Snapshot Count: 13
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: pod-sjc1-gluster1:/data/brick1/gv0
> Brick2: pod-sjc1-gluster2:/data/brick1/gv0
> Brick3: pod-sjc1-gluster1:/data/brick2/gv0
> Brick4: pod-sjc1-gluster2:/data/brick2/gv0
> Brick5: pod-sjc1-gluster1:/data/brick3/gv0
> Brick6: pod-sjc1-gluster2:/data/brick3/gv0
> Options Reconfigured:
> performance.cache-refresh-timeout: 60
> performance.stat-prefetch: on
> server.allow-insecure: on
> performance.flush-behind: on
> performance.rda-cache-limit: 32MB
> network.tcp-window-size: 1048576
> performance.nfs.io-threads: on
> performance.write-behind-window-size: 4MB
> performance.nfs.write-behind-window-size: 512MB
> performance.io-cache: on
> performance.quick-read: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 90000
> performance.cache-size: 4GB
> server.event-threads: 16
> client.event-threads: 16
> features.barrier: disable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
> cluster.lookup-optimize: on
> server.outstanding-rpc-limit: 1024
> auto-delete: enable
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
> ------------------------------------------------------------
> ------------------
> Hot Bricks:
> Brick pod-sjc1-gluster2:/data/
> hot_tier/gv0                                49219     0          Y
>  26714
> Brick pod-sjc1-gluster1:/data/
> hot_tier/gv0                                49199     0          Y
>  21325
> Cold Bricks:
> Brick pod-sjc1-gluster1:/data/
> brick1/gv0                                  49152     0          Y
>  3178
> Brick pod-sjc1-gluster2:/data/
> brick1/gv0                                  49152     0          Y
>  4818
> Brick pod-sjc1-gluster1:/data/
> brick2/gv0                                  49153     0          Y
>  3186
> Brick pod-sjc1-gluster2:/data/
> brick2/gv0                                  49153     0          Y
>  4829
> Brick pod-sjc1-gluster1:/data/
> brick3/gv0                                  49154     0          Y
>  3194
> Brick pod-sjc1-gluster2:/data/
> brick3/gv0                                  49154     0          Y
>  4840
> Tier Daemon on localhost                    N/A       N/A        Y
>  20313
> Self-heal Daemon on localhost               N/A       N/A        Y
>  32023
> Tier Daemon on pod-sjc1-gluster1            N/A       N/A        Y
>  24758
> Self-heal Daemon on pod-sjc1-gluster2       N/A       N/A        Y
>  12349
>
> Task Status of Volume gv0
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham at redhat.com> wrote:
>
>> Hi,
>>
>> Can you send the volume info, and volume status output and the tier logs.
>> And I need to know the size of the files that are being stored.
>>
>> On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite at gmail.com> wrote:
>> > I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server
>> / 3
>> > bricks per server distributed replicated volume.
>> >
>> > I'm seeing IO get blocked across all client FUSE threads for 10 to 15
>> > seconds while the promotion daemon runs. I see the 'glustertierpro'
>> thread
>> > jump to 99% CPU usage on both boxes when these delays occur and they
>> happen
>> > every 25 minutes (my tier-promote-frequency setting).
>> >
>> > I suspect this has something to do with the heat database in sqlite,
>> maybe
>> > something is getting locked while it runs the query to determine files
>> to
>> > promote. My volume contains approximately 18 million files.
>> >
>> > Has anybody else seen this? I suspect that these delays will get worse
>> as I
>> > add more files to my volume which will cause significant problems.
>> >
>> > Here are my hot tier settings:
>> >
>> > # gluster volume get gv0 all | grep tier
>> > cluster.tier-pause                      off
>> > cluster.tier-promote-frequency          1500
>> > cluster.tier-demote-frequency           3600
>> > cluster.tier-mode                       cache
>> > cluster.tier-max-promote-file-size      10485760
>> > cluster.tier-max-mb                     64000
>> > cluster.tier-max-files                  100000
>> > cluster.tier-query-limit                100
>> > cluster.tier-compact                    on
>> > cluster.tier-hot-compact-frequency      86400
>> > cluster.tier-cold-compact-frequency     86400
>> >
>> > # gluster volume get gv0 all | grep threshold
>> > cluster.write-freq-threshold            2
>> > cluster.read-freq-threshold             5
>> >
>> > # gluster volume get gv0 all | grep watermark
>> > cluster.watermark-hi                    92
>> > cluster.watermark-low                   75
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>> Regards,
>> Hari Gowtham.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180110/436e25f6/attachment.html>