[Gluster-users] gluster tiering errors

Herb Burnswell herbert.burnswell at gmail.com
Fri Oct 27 19:33:19 UTC 2017


Milind - Thank you for your help, I appreciate it..

It appears that the tiering behaves the same when quota is turned off, info:

# gluster vol info <vol>

Volume Name: <vol>
Type: Tier
Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad
Status: Started
Snapshot Count: 0
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: <node2>:/mnt/brick_nvme1/brick
Brick2: <node1>:/mnt/brick_nvme2/brick
Brick3: <node2>:/mnt/brick_nvme2/brick
Brick4: <node1>:/mnt/brick_nvme1/brick
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: <node1>:/mnt/brick1/brick
Brick6: <node2>:/mnt/brick2/brick
Brick7: <node1>:/mnt/brick2/brick
Brick8: <node2>:/mnt/brick1/brick
Options Reconfigured:
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.write-behind-window-size: 4MB
performance.cache-size: 16GB
features.inode-quota: off
features.quota: off
nfs.disable: on
transport.address-family: inet
features.ctr-enabled: on
cluster.tier-mode: cache
performance.io-cache: off
performance.quick-read: off
cluster.tier-max-files: 1000000

Errors in /var/log/glusterfs/tier/<vol>/tierd.log on node1 after turning
off quota:

[2017-10-27 18:38:08.880502] E [MSGID: 109011]
[dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
path=/path/to/83540503.jpg
[2017-10-27 18:38:08.880686] E [MSGID: 109023]
[dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
failed to create /path/to/83540503.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.880717] E [MSGID: 0]
[dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed
on - <vol>-hot-dht for file - /path/to/83540503.jpg
[2017-10-27 18:38:08.881101] E [MSGID: 109037]
[tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate
/path/to/83540503.jpg  [No space left on device]
[2017-10-27 18:38:08.881145] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for 83540503.jpg(gfid:00cf352a-0a21-42d3-91ae-fe6fc63fac9d)
[2017-10-27 18:38:08.891692] E [MSGID: 109011]
[dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
path=/path/to/152640504.jpg
[2017-10-27 18:38:08.891876] E [MSGID: 109023]
[dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
failed to create /path/to/152640504.jpg on <vol>-hot-dht [Input/output
error]
[2017-10-27 18:38:08.891899] E [MSGID: 0]
[dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed
on - <vol>-hot-dht for file - /path/to/152640504.jpg
[2017-10-27 18:38:08.920077] E [MSGID: 109037]
[tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate
/path/to/152640504.jpg  [No space left on device]
[2017-10-27 18:38:08.920121] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for 152640504.jpg(gfid:0436b8b5-2e15-411e-acfa-a5870cf125bf)
[2017-10-27 18:38:08.952939] E [MSGID: 109011]
[dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
path=/path/to/89240318.jpg
[2017-10-27 18:38:08.953121] E [MSGID: 109023]
[dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
failed to create /path/to/89240318.jpg on <vol>-hot-dht [Input/output error]
[2017-10-27 18:38:08.953147] E [MSGID: 0]
[dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed
on - <vol>-hot-dht for file - /path/to/89240318.jpg
[2017-10-27 18:38:08.959510] E [MSGID: 109037]
[tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate
/path/to/89240318.jpg  [No space left on device]
[2017-10-27 18:38:08.959560] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for 89240318.jpg(gfid:1143c9bb-ea79-4c15-ad03-97a611d53135)
[2017-10-27 18:38:08.986665] E [MSGID: 109011]
[dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
path=/path/to/106056906.jpg
[2017-10-27 18:38:08.986871] E [MSGID: 109023]
[dht-rebalance.c:757:__dht_rebalance_create_dst_file] 0-<vol>-tier-dht:
failed to create /path/to/106056906.jpg on <vol>-hot-dht [Input/output
error]
[2017-10-27 18:38:08.986904] E [MSGID: 0]
[dht-rebalance.c:1696:dht_migrate_file] 0-<vol>-tier-dht: Create dst failed
on - <vol>-hot-dht for file - /path/to/106056906.jpg
[2017-10-27 18:38:08.991468] E [MSGID: 109037]
[tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate
/path/to/106056906.jpg  [No space left on device]
[2017-10-27 18:38:08.991505] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for 106056906.jpg(gfid:07f5e5d4-315f-4299-a62f-6bd8f159c89d)
[2017-10-27 18:38:09.025433] E [MSGID: 109011]
[dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
path=/path/to/114649988.jpg

I wanted to add a couple data points here:

- Most (95%) of the logging is logged to node1 of the 2 node cluster.

     The tierd.log file on node1 is 588M in size due to all of the failure
errors.  The tierd.log file on node2 is only ~205K in size.
     I believe I posted earlier that all promoted files are listed on node1:

     # gluster vol tier <vol> status
      Node                 Promoted files       Demoted files
Status               run time in h:m:s
      ------                        ---------
 ---------            ---------                ---------
    <node2>                      0                            0
        in progress          601:37:43
    <node1>                 271966                       0
  in progress          601:37:42

     Is this expected behavior?

- We are sharing the data (the same share) via SMB and AFP to be accessed
by PC's and Mac's.  The Mac's are using AFP since they have so much
difficultly with SMB and network file shares.

     I know the Mac's create all kinds of 'special' files when working on
the share, could there be a problem with certain files and tiering?  For
example (from node2 tierd.log):

     [2017-10-26 19:30:08.147159] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for .DS_Store(gfid:db430070-b9c5-4bd2-b4c6-a347b838a97e)
     [2017-10-26 22:28:08.218565] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for .DS_Store(gfid:f745bea6-04bd-4904-8237-1bd7c9c92f5b)
     [2017-10-26 22:28:08.221909] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for .DS_Store(gfid:bed73314-8740-4822-9fb7-95257434e283)
     [2017-10-26 22:28:08.223767] I [MSGID: 109038]
[tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
failed for .DS_Store(gfid:bf1df49b-c264-449d-9bc6-65bcfd48fa4e)

     The .DS_Store files are Mac specific files..

     Since users work directly off of the share, are there potential
problems with tiering and locks?  I do see warnings (on node1 tierd.log):

     [2017-10-27 18:30:08.719976] W [MSGID: 109023]
[dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file
failed: /path/to/file.ai: File has locks. Skipping file migration
     [2017-10-27 18:32:08.483971] W [MSGID: 109023]
[dht-rebalance.c:639:__is_file_migratable] 0-<vol>-tier-dht: Migrate file
failed: /path/to/file-v1.ai: File has locks. Skipping file migration


- The directory structure (over the many years) has spaces in the names of
files and folders, sometimes I'm finding, even at the end of a file.

     Could spaces in names of files and folders be causing issues with
tiering?


I'm still not sure what the [No space left on device] messages are coming
from as it does not appear that there are any space issues.  Even before I
turned off quota on the volume the sizing appeared to be fine:


# gluster vol quota <vol> list
                  Path                   Hard-limit          Soft-limit
     Used       Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/path1                                   500.0GB     80%(400.0GB)    1.9MB
    500.0GB              No                   No
/path2                                    25.0TB       80%(20.0TB)
 19.2TB        5.8TB                 No                   No


I will have some time this weekend to take the shares offline.  Are there
any steps I can take to clean up the hot tier, resync, or other, to ensure
all is in a good state?

Thanks in advance..

HB





On Thu, Oct 26, 2017 at 9:17 PM, Milind Changire <mchangir at redhat.com>
wrote:

> Herb,
> I'm trying to weed out issues here.
>
> So, I can see quota turned *on* and would like you to check the quota
> settings and test to see system behavior *if quota is turned off*.
>
> Although the file size that failed migration was 29K, I'm being a bit
> paranoid while weeding out issues.
>
> Are you still facing tiering errors ?
> I can see your response to Alex with the disk space consumption and found
> it a bit ambiguous w.r.t. state of affairs.
>
> --
> Milind
>
>
>
> On Tue, Oct 24, 2017 at 11:34 PM, Herb Burnswell <
> herbert.burnswell at gmail.com> wrote:
>
>> Milind - Thank you for the response..
>>
>> >> What are the high and low watermarks for the tier set at ?
>>
>> # gluster volume get <vol> cluster.watermark-hi
>> Option                                  Value
>>
>> ------                                  -----
>>
>> cluster.watermark-hi                    90
>>
>>
>> # gluster volume get <vol> cluster.watermark-low
>> Option                                  Value
>>
>> ------                                  -----
>>
>> cluster.watermark-low                   75
>>
>>
>>
>> >> What is the size of the file that failed to migrate as per the
>> following tierd log:
>>
>> >> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>>
>> The file was a word doc @ 29K in size.
>>
>> >>If possible, a *gluster volume info* would also help, instead of going
>> to and fro with questions.
>>
>> # gluster vol info
>>
>> Volume Name: ctdb
>> Type: Replicate
>> Volume ID: f679c476-e0dd-4f3a-9813-1b26016b5384
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: <node1>:/mnt/ctdb_local/brick
>> Brick2: <node2>:/mnt/ctdb_local/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> Volume Name: <vol>
>> Type: Tier
>> Volume ID: 7710ed2f-775e-4dd9-92ad-66407c72b0ad
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 8
>> Transport-type: tcp
>> Hot Tier :
>> Hot Tier Type : Distributed-Replicate
>> Number of Bricks: 2 x 2 = 4
>> Brick1: <node2>:/mnt/brick_nvme1/brick
>> Brick2: <node1>:/mnt/brick_nvme2/brick
>> Brick3: <node2>:/mnt/brick_nvme2/brick
>> Brick4: <node1>:/mnt/brick_nvme1/brick
>> Cold Tier:
>> Cold Tier Type : Distributed-Replicate
>> Number of Bricks: 2 x 2 = 4
>> Brick5: <node1>:/mnt/brick1/brick
>> Brick6: <node2>:/mnt/brick2/brick
>> Brick7: <node1>:/mnt/brick2/brick
>> Brick8: <node2>:/mnt/brick1/brick
>> Options Reconfigured:
>> cluster.lookup-optimize: on
>> client.event-threads: 4
>> server.event-threads: 4
>> performance.write-behind-window-size: 4MB
>> performance.cache-size: 16GB
>> features.quota-deem-statfs: on
>> features.inode-quota: on
>> features.quota: on
>> nfs.disable: on
>> transport.address-family: inet
>> features.ctr-enabled: on
>> cluster.tier-mode: cache
>> performance.io-cache: off
>> performance.quick-read: off
>> cluster.tier-max-files: 1000000
>>
>>
>> HB
>>
>>
>>
>>
>> On Sun, Oct 22, 2017 at 8:41 AM, Milind Changire <mchangir at redhat.com>
>> wrote:
>>
>>> Herb,
>>> What are the high and low watermarks for the tier set at ?
>>>
>>> # gluster volume get <vol> cluster.watermark-hi
>>>
>>> # gluster volume get <vol> cluster.watermark-low
>>>
>>> What is the size of the file that failed to migrate as per the following
>>> tierd log:
>>>
>>> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht: Promotion
>>> failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>>>
>>> If possible, a *gluster volume info* would also help, instead of going
>>> to and fro with questions.
>>>
>>> --
>>> Milind
>>>
>>>
>>>
>>> On Fri, Oct 20, 2017 at 12:42 AM, Herb Burnswell <
>>> herbert.burnswell at gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> I am new to gluster and have some questions/concerns about some tiering
>>>> errors that I see in the log files.
>>>>
>>>> OS: CentOs 7.3.1611
>>>> Gluster version: 3.10.5
>>>> Samba version: 4.6.2
>>>>
>>>> I see the following (scrubbed):
>>>>
>>>> Node 1 /var/log/glusterfs/tier/<vol>/tierd.log:
>>>>
>>>> [2017-10-19 17:52:07.519614] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:edaf97e1-02e0-4838-9d26-71ea3aab22fb)
>>>> [2017-10-19 17:52:07.525110] E [MSGID: 109011]
>>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>>> path=/path/to/<file>
>>>> [2017-10-19 17:52:07.526088] E [MSGID: 109023]
>>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file]
>>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output
>>>> error]
>>>> [2017-10-19 17:52:07.526111] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>>> [2017-10-19 17:52:07.527214] E [MSGID: 109037]
>>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>>> [No space left on device]
>>>> [2017-10-19 17:52:07.527244] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:fb4411c4-a387-4e5f-a2b7-897633ef4aa8)
>>>> [2017-10-19 17:52:07.533510] E [MSGID: 109011]
>>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>>> path=/path/to/<file>
>>>> [2017-10-19 17:52:07.534434] E [MSGID: 109023]
>>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file]
>>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output
>>>> error]
>>>> [2017-10-19 17:52:07.534453] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>>> [2017-10-19 17:52:07.535570] E [MSGID: 109037]
>>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>>> [No space left on device]
>>>> [2017-10-19 17:52:07.535594] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:fba421e7-0500-47c4-bf67-10a40690e13d)
>>>> [2017-10-19 17:52:07.541363] E [MSGID: 109011]
>>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>>> path=/path/to/<file>
>>>> [2017-10-19 17:52:07.542296] E [MSGID: 109023]
>>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file]
>>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output
>>>> error]
>>>> [2017-10-19 17:52:07.542357] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>>> [2017-10-19 17:52:07.543480] E [MSGID: 109037]
>>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>>> [No space left on device]
>>>> [2017-10-19 17:52:07.543521] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:fe6799e1-42e6-43e5-a7eb-ac8facfcbc9f)
>>>> [2017-10-19 17:52:07.549959] E [MSGID: 109011]
>>>> [dht-common.c:7188:dht_create] 0-<vol>-hot-dht: no subvolume in layout for
>>>> path=/path/to/<file>
>>>> [2017-10-19 17:52:07.550901] E [MSGID: 109023]
>>>> [dht-rebalance.c:757:__dht_rebalance_create_dst_file]
>>>> 0-<vol>-tier-dht: failed to create <file> on <vol>-hot-dht [Input/output
>>>> error]
>>>> [2017-10-19 17:52:07.550922] E [MSGID: 0] [dht-rebalance.c:1696:dht_migrate_file]
>>>> 0-<vol>-tier-dht: Create dst failed on - <vol>-hot-dht for file - <file>
>>>> [2017-10-19 17:52:07.551896] E [MSGID: 109037]
>>>> [tier.c:969:tier_migrate_link] 0-<vol>-tier-dht: Failed to migrate <file>
>>>> [No space left on device]
>>>> [2017-10-19 17:52:07.551917] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:ffe3a3f2-b170-43f0-a9fb-97c78e3173eb)
>>>> [2017-10-19 17:52:07.551945] E [MSGID: 109037] [tier.c:2565:tier_run]
>>>> 0-<vol>-tier-dht: Promotion failed
>>>>
>>>> Node 1 /var/log/samba/glusterfs-<vol>-pool.log:
>>>>
>>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote
>>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:41.481860] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote
>>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:41.485916] E [MSGID: 109089]
>>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1ff570, flags=00) on file
>>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid
>>>> argument]
>>>> [2017-10-18 17:13:41.488223] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-0: remote
>>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:41.488235] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-1: remote
>>>> operation failed. Path: /pool/testing (7d89b9a8-3e5d-4f28-9e57-039fe4416994)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:41.489060] E [MSGID: 109089]
>>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf1feb50, flags=00) on file
>>>> 7d89b9a8-3e5d-4f28-9e57-039fe4416994 @ <vol>-cold-dht [Invalid
>>>> argument]
>>>> [2017-10-18 17:13:42.339936] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:42.339988] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote
>>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:42.343769] E [MSGID: 109089]
>>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf2012c0, flags=00) on file
>>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>>> [2017-10-18 17:13:42.345374] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:42.345401] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-5: remote
>>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>>> [Invalid argument]
>>>> [2017-10-18 17:13:42.346259] E [MSGID: 109089]
>>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f02bf201130, flags=00) on file
>>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>>> [2017-10-18 17:13:59.541591] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 17:13:59.541748] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 17:13:59.541887] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 17:13:59.541977] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>>
>>>> Node 2 /var/log/gluster/tier/<vol>/tierd.log:
>>>>
>>>> [2017-10-16 15:54:08.662873] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:fffd714e-b2d2-42d3-a31f-72673276e3d0)
>>>> [2017-10-16 16:00:07.201584] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:f10365e1-747b-4985-97b9-8b5dc61ac464)
>>>> [2017-10-16 16:00:07.372559] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:f95f17bf-b696-44cd-aae0-d8ac38149aa5)
>>>> [2017-10-16 16:06:06.880522] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:ec451f6c-8971-4f9b-a04f-00f96db9b46a)
>>>> [2017-10-16 16:06:08.062080] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:e658cd70-3f6d-4b25-8d9f-0d4c24d3ec5d)
>>>> [2017-10-16 16:06:08.288298] I [MSGID: 109038]
>>>> [tier.c:1169:tier_migrate_using_query_file] 0-<vol>-tier-dht:
>>>> Promotion failed for <file>(gfid:f22df67a-88e5-4fae-aab0-b00e04f9a6e1)
>>>> [2017-10-18 15:55:06.446416] I [MSGID: 109028]
>>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is
>>>> in progress. Time taken is 1376671.00 secs
>>>> [2017-10-18 15:55:06.446433] I [MSGID: 109028]
>>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 0, size: 0, lookups: 47887089, failures: 3594, skipped: 0
>>>> [2017-10-19 00:00:00.501576] I [MSGID: 109038]
>>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction
>>>> on cold tier
>>>> [2017-10-19 00:00:00.502016] I [MSGID: 109038]
>>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on
>>>> cold tier
>>>> [2017-10-19 00:00:00.501608] I [MSGID: 109038]
>>>> [tier.c:2391:tier_prepare_compact] 0-<vol>-tier-dht: Start compaction
>>>> on cold tier
>>>> [2017-10-19 00:00:00.502076] I [MSGID: 109038]
>>>> [tier.c:2403:tier_prepare_compact] 0-<vol>-tier-dht: End compaction on
>>>> cold tier
>>>> [2017-10-19 16:03:49.522991] I [MSGID: 109028]
>>>> [dht-rebalance.c:4792:gf_defrag_status_get] 0-glusterfs: Rebalance is
>>>> in progress. Time taken is 1463594.00 secs
>>>> [2017-10-19 16:03:49.523017] I [MSGID: 109028]
>>>> [dht-rebalance.c:4796:gf_defrag_status_get] 0-glusterfs: Files
>>>> migrated: 0, size: 0, lookups: 52790654, failures: 3594, skipped: 0
>>>>
>>>> Node 2 /var/log/samba/glusterfs-<vol>-pool.log:
>>>>
>>>> [2017-10-18 16:49:09.218062] E [MSGID: 114031]
>>>> [client-rpc-fops.c:443:client3_3_open_cbk] 0-<vol>-client-4: remote
>>>> operation failed. Path: /pool (34d76e11-412f-4bc6-9a3e-b1f89658f13b)
>>>> [Invalid argument]
>>>> [2017-10-18 16:49:09.218254] E [MSGID: 109089]
>>>> [dht-helper.c:517:dht_check_and_open_fd_on_subvol_task]
>>>> 0-<vol>-tier-dht: Failed to open the fd (0x7f009b36bac0, flags=00) on file
>>>> 34d76e11-412f-4bc6-9a3e-b1f89658f13b @ <vol>-hot-dht [Invalid argument]
>>>> [2017-10-18 16:49:09.222783] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-0: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 16:49:09.222912] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-1: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 16:49:09.223079] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-2: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>> [2017-10-18 16:49:09.223200] E [MSGID: 108006]
>>>> [afr-common.c:4808:afr_notify] 0-<vol>-replicate-3: All subvolumes are
>>>> down. Going offline until atleast one of them comes back up.
>>>>
>>>> Status:
>>>>
>>>> # gluster vol tier <vol> status
>>>>
>>>> Node                 Promoted files       Demoted files        Status
>>>>              run time in h:m:s
>>>> ---------                  ---------                  ---------
>>>>        ---------                 ---------
>>>> Node1                  190861                    0
>>>> in progress          408:34:13
>>>> Node2                            0                    0
>>>>     in progress          408:34:14
>>>>
>>>> Hot tier bricks:
>>>>
>>>> # df -h
>>>>
>>>> /dev/mapper/vg_bricks-brick_nvme1             1.4T  551G  883G  39%
>>>> /mnt/brick_nvme1
>>>> /dev/mapper/vg_bricks-brick_nvme2             1.4T  512G  922G  36%
>>>> /mnt/brick_nvme2
>>>>
>>>>
>>>> Can anyone point me in the right direction as to what may be going on?
>>>> Any guidance is greatly appreciated.
>>>>
>>>> Thanks in advance,
>>>>
>>>> HB
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>>
>>> --
>>> Milind
>>>
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Milind
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171027/bf9a44eb/attachment.html>


More information about the Gluster-users mailing list