[Bugs] [Bug 1464495] [Remove-brick] Hardlink migration fails with " migrate-data failed for $file [Unknown error 109023]" errors in rebalance logs

bugzilla at redhat.com bugzilla at redhat.com
Tue Jun 27 05:43:15 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1464495



--- Comment #4 from Susant Kumar Palai <spalai at redhat.com> ---
Description of problem:
=======================
With a large dataset of files and hardlinks, remove-brick migration for few
files is failing throwing below errors in rebalance logs. Also, due to the
migration failure there are few files left on the decomissioned bricks, so we
will end up loosing the files on the mountpoint if we commit the remove-brick.

[2017-06-20 10:55:29.207269] I [MSGID: 109045]
[dht-common.c:2015:dht_lookup_everywhere_cbk] 0-distrep-dht: attempting
deletion of stale linkfile /fl13739 on distrep-readdir-ahead-1 (hashed subvol
is distrep-readdir-ahead-3)
[2017-06-20 10:55:29.215749] I [MSGID: 109069]
[dht-common.c:1327:dht_lookup_unlink_cbk] 0-distrep-dht: lookup_unlink returned
with op_ret -> 0 and op-errno -> 0 for /fl13739
[2017-06-20 10:55:29.235774] I [dht-rebalance.c:1514:dht_migrate_file]
0-distrep-dht: /fl13739: attempting to move from distrep-readdir-ahead-0 to
distrep-readdir-ahead-3
[2017-06-20 10:55:29.243632] I [dht-rebalance.c:403:gf_defrag_handle_hardlink]
0-distrep-dht: Attempting to migrate hardlink fl13739 with gfid
be689428-c8e4-45f4-9871-95ddf9e31719 from distrep-readdir-ahead-0 ->
distrep-readdir-ahead-3
[2017-06-20 10:55:29.255513] W [MSGID: 114031]
[client-rpc-fops.c:2777:client3_3_link_cbk] 0-distrep-client-2: remote
operation failed: (/fl13739 -> /fl13739) [Stale file handle]
[2017-06-20 10:55:29.262225] W [MSGID: 114031]
[client-rpc-fops.c:2777:client3_3_link_cbk] 0-distrep-client-3: remote
operation failed: (/fl13739 -> /fl13739) [Stale file handle]
[2017-06-20 10:55:29.266772] E [MSGID: 109084]
[dht-rebalance.c:459:gf_defrag_handle_hardlink] 0-distrep-dht: link of fl13739
-> be689428-c8e4-45f4-9871-95ddf9e31719 failed on  subvol
distrep-readdir-ahead-1 [Stale file handle]
[2017-06-20 10:55:29.266967] W [MSGID: 109023]
[dht-rebalance.c:522:__check_file_has_hardlink] 0-distrep-dht: Migrate file
failed:/fl13739: failed to migrate file with link
[2017-06-20 10:55:29.272951] E [MSGID: 116]
[dht-rebalance.c:2667:gf_defrag_migrate_single_file] 0-distrep-dht:
migrate-data failed for /fl13739 [Unknown error 109023]
~

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it.
2)  Enable brick mux "cluster.brick-multiplex" and set the below options to
enable parallel -readdirp
gluster volume set <VOLNAME> performance.parallel-readdir on
gluster volume set <VOLNAME> rda-cache-limit 10MB
3) cifs mount it on multiple clients.
4) Perform below tasks simultaneously from multiple clients,
     a) From client-1, touch -->  for i in {1..20000};do touch f$i;done
     b) From client-2, create hard links for the created files , for i in
{1..20000};do ln f$i fl$i;done
     c) From client-3, change the permissions for the created files, for i in
{1..20000};do chmod 660 f$i;done
     d) From client-4, do a continuous lookup from two terminals.
5) While the tasks in step-4 are in progress, add few bricks to the volume and
start rebalance.
6) Wait till step-4 and step-5 completes.
7) Now, remove the bricks added in step-5 (with continuous lookups from
multiple clients)

Remove-brick completed with many failures and there are few files left on the
decommissioned bricks.

Actual results:
===============
Remove-brick is failing to migrate few files.

Expected results:
=================
All the files should be migrated without any errors/issues during remove-brick

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=kj6YjFIqqz&a=cc_unsubscribe


More information about the Bugs mailing list