[Gluster-users] gluster tiering errors

Milind Changire mchangir at redhat.com
Fri Oct 27 04:17:41 UTC 2017


Herb,
I'm trying to weed out issues here.

So, I can see quota turned *on* and would like you to check the quota
settings and test to see system behavior *if quota is turned off*.

Although the file size that failed migration was 29K, I'm being a bit
paranoid while weeding out issues.

Are you still facing tiering errors ?
I can see your response to Alex with the disk space consumption and found
it a bit ambiguous w.r.t. state of affairs.

--
Milind



On Tue, Oct 24, 2017 at 11:34 PM, Herb Burnswell <
herbert.burnswell at gmail.com> wrote:

> Milind - Thank you for the response..
>
> >> What are the high and low watermarks for the tier set at ?
>
> # gluster volume get <vol> cluster.watermark-hi
> Option                                  Value
>
> ------                                  -----
>
> cluster.watermark-hi                    90
>
>
> # gluster volume get <vol> cluster.watermark-low
> Option                                  Value
>
> ------                                  -----
>
> cluster.watermark-low                   75
>
>
>
> >> What is the size of the file that failed to migrate as per the
> following tierd log:
>
> >> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>
> The file was a word doc @ 29K in size.
>
> >>If possible, a *gluster volume info* would also help, instead of going
> to and fro with questions.
>
> # gluster vol info
>
> Volume Name: ctdb
> Type: Replicate
> Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: <node1>:/mnt/ctdb_local/brick
> Brick2: <node2>:/mnt/ctdb_local/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
>
> Volume Name: <vol>
> Type: Tier
> Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 8
> Transport-type: tcp
> Hot Tier :
> Hot Tier Type : Distributed-Replicate
> Number of Bricks: 2 x 2 = 4
> Brick1: <node2>:/mnt/brick_nvme1/brick
> Brick2: <node1>:/mnt/brick_nvme2/brick
> Brick3: <node2>:/mnt/brick_nvme2/brick
> Brick4: <node1>:/mnt/brick_nvme1/brick
> Cold Tier:
> Cold Tier Type : Distributed-Replicate
> Number of Bricks: 2 x 2 = 4
> Brick5: <node1>:/mnt/brick1/brick
> Brick6: <node2>:/mnt/brick2/brick
> Brick7: <node1>:/mnt/brick2/brick
> Brick8: <node2>:/mnt/brick1/brick
> Options Reconfigured:
> cluster.lookup-optimize: on
> client.event-threads: 4
> server.event-threads: 4
> performance.write-behind-window-size: 4MB
> performance.cache-size: 16GB
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> nfs.disable: on
> transport.address-family: inet
> features.ctr-enabled: on
> cluster.tier-mode: cache
> performance.io-cache: off
> performance.quick-read: off
> cluster.tier-max-files: 1000000
>
>
> HB
>
>
>
>
> On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir at redhat.com>
> wrote:
>
>> Herb,
>> What are the high and low watermarks for the tier set at ?
>>
>> # gluster volume get <vol> cluster.watermark-hi
>>
>> # gluster volume get <vol> cluster.watermark-low
>>
>> What is the size of the file that failed to migrate as per the following
>> tierd log:
>>
>> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>>
>> If possible, a *gluster volume info* would also help, instead of going
>> to and fro with questions.
>>
>> --
>> Milind
>>
>>
>>
>> On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell <
>> herbert.burnswell at gmail.com> wrote:
>>
>>> All,
>>>
>>> I am new to gluster and have some questions/concerns about some tiering
>>> errors that I see in the log files.
>>>
>>> OS: CentOs 7.3.1611
>>> Gluster version: 3.10.5
>>> Samba version: 4.6.2
>>>
>>> I see the following (scrubbed):
>>>
>>> Node 1 /var/log/glusterfs/tier/<vol>/tierd.log:
>>>
>>> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>>> [2017-10-19 17:52:07.525110] E [MSGID: 109011]
>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>> path=/path/to/<file>
>>> [2017-10-19 17:52:07.526088] E [MSGID: 109023]
>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
>>> failed to create <file> on <vol>-hot-dht [Input/output error]
>>> [2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>> [2017-10-19 17:52:07.527214] E [MSGID: 109037]
>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>> [No space left on device]
>>> [2017-10-19 17:52:07.527244] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8)
>>> [2017-10-19 17:52:07.533510] E [MSGID: 109011]
>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>> path=/path/to/<file>
>>> [2017-10-19 17:52:07.534434] E [MSGID: 109023]
>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
>>> failed to create <file> on <vol>-hot-dht [Input/output error]
>>> [2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>> [2017-10-19 17:52:07.535570] E [MSGID: 109037]
>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>> [No space left on device]
>>> [2017-10-19 17:52:07.535594] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d)
>>> [2017-10-19 17:52:07.541363] E [MSGID: 109011]
>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>> path=/path/to/<file>
>>> [2017-10-19 17:52:07.542296] E [MSGID: 109023]
>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
>>> failed to create <file> on <vol>-hot-dht [Input/output error]
>>> [2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>> [2017-10-19 17:52:07.543480] E [MSGID: 109037]
>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>> [No space left on device]
>>> [2017-10-19 17:52:07.543521] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f)
>>> [2017-10-19 17:52:07.549959] E [MSGID: 109011]
>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>> path=/path/to/<file>
>>> [2017-10-19 17:52:07.550901] E [MSGID: 109023]
>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
>>> failed to create <file> on <vol>-hot-dht [Input/output error]
>>> [2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>> [2017-10-19 17:52:07.551896] E [MSGID: 109037]
>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>> [No space left on device]
>>> [2017-10-19 17:52:07.551917] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb)
>>> [2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run]
>>> 0-<vol>-tier-dht: Promotion failed
>>>
>>> Node 1 /var/log/samba/glusterfs-<vol>-pool.log:
>>>
>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote
>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>> [Invalid argument]
>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote
>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>> [Invalid argument]
>>> [2017-10-18 17:13:41.485916] E [MSGID: 109089]
>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file
>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
>>> [2017-10-18 17:13:41.488223] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote
>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>> [Invalid argument]
>>> [2017-10-18 17:13:41.488235] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote
>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>> [Invalid argument]
>>> [2017-10-18 17:13:41.489060] E [MSGID: 109089]
>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file
>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid argument]
>>> [2017-10-18 17:13:42.339936] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>> [Invalid argument]
>>> [2017-10-18 17:13:42.339988] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote
>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>> [Invalid argument]
>>> [2017-10-18 17:13:42.343769] E [MSGID: 109089]
>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file
>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>> [2017-10-18 17:13:42.345374] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>> [Invalid argument]
>>> [2017-10-18 17:13:42.345401] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote
>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>> [Invalid argument]
>>> [2017-10-18 17:13:42.346259] E [MSGID: 109089]
>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file
>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>> [2017-10-18 17:13:59.541591] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 17:13:59.541748] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 17:13:59.541887] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 17:13:59.541977] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>>
>>> Node 2 /var/log/gluster/tier/<vol>/tierd.log:
>>>
>>> [2017-10-16 15:54:08.662873] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0)
>>> [2017-10-16 16:00:07.201584] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464)
>>> [2017-10-16 16:00:07.372559] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5)
>>> [2017-10-16 16:06:06.880522] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a)
>>> [2017-10-16 16:06:08.062080] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d)
>>> [2017-10-16 16:06:08.288298] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1)
>>> [2017-10-18 15:55:06.446416] I [MSGID: 109028]
>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is
>>> in progress. Time taken is 1376671.00 secs
>>> [2017-10-18 15:55:06.446433] I [MSGID: 109028]
>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files
>>> migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0
>>> [2017-10-19 00:00:00.501576] I [MSGID: 109038]
>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction
>>> on cold tier
>>> [2017-10-19 00:00:00.502016] I [MSGID: 109038]
>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on
>>> cold tier
>>> [2017-10-19 00:00:00.501608] I [MSGID: 109038]
>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction
>>> on cold tier
>>> [2017-10-19 00:00:00.502076] I [MSGID: 109038]
>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on
>>> cold tier
>>> [2017-10-19 16:03:49.522991] I [MSGID: 109028]
>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is
>>> in progress. Time taken is 1463594.00 secs
>>> [2017-10-19 16:03:49.523017] I [MSGID: 109028]
>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files
>>> migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0
>>>
>>> Node 2 /var/log/samba/glusterfs-<vol>-pool.log:
>>>
>>> [2017-10-18 16:49:09.218062] E [MSGID: 114031]
>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>> [Invalid argument]
>>> [2017-10-18 16:49:09.218254] E [MSGID: 109089]
>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file
>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>> [2017-10-18 16:49:09.222783] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 16:49:09.222912] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 16:49:09.223079] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>> [2017-10-18 16:49:09.223200] E [MSGID: 108006]
>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are
>>> down. Going offline until atleast one of them comes back up.
>>>
>>> Status:
>>>
>>> # gluster vol tier <vol> status
>>>
>>> Node                 Promoted files       Demoted files        Status
>>>            run time in h:m:s
>>> ---------                  ---------                  ---------
>>>      ---------                 ---------
>>> Node1                  190861                    0                    in
>>> progress          408:34:13
>>> Node2                            0                    0
>>>   in progress          408:34:14
>>>
>>> Hot tier bricks:
>>>
>>> # df -h
>>>
>>> /dev/mapper/vg_bricks-brick_nvme1             1.4T  551G  883G  39%
>>> /mnt/brick_nvme1
>>> /dev/mapper/vg_bricks-brick_nvme2             1.4T  512G  922G  36%
>>> /mnt/brick_nvme2
>>>
>>>
>>> Can anyone point me in the right direction as to what may be going on?
>>> Any guidance is greatly appreciated.
>>>
>>> Thanks in advance,
>>>
>>> HB
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Milind
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171027/bc6f9958/attachment.html>


More information about the Gluster-users mailing list