[Bugs] [Bug 1572585] New: Remove-brick failed on Distributed volume while rm -rf is in-progress

bugzilla at redhat.com bugzilla at redhat.com
Fri Apr 27 11:23:16 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1572585

            Bug ID: 1572585
           Summary: Remove-brick failed on Distributed volume while rm -rf
                    is in-progress
           Product: Red Hat Gluster Storage
           Version: 3.4
         Component: distribute
          Severity: medium
          Assignee: nbalacha at redhat.com
          Reporter: tdesala at redhat.com
        QA Contact: tdesala at redhat.com
                CC: bugs at gluster.org, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, spalai at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1572581



+++ This bug was initially created as a clone of Bug #1572581 +++

Description of problem:
Remove-brick operation fails while rm -rf is inprogress


How reproducible:
Always

Steps to Reproduce:
1. Create data (I used linux untar)
2. Start remove-brick process
3. issue rm -rf * on mount


[root at vm3 upstream]# gvi

Volume Name: test1
Type: Distribute
Volume ID: de28535e-1873-429a-a5ef-9dc4814b6b93
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: vm3:/extraspace/brick/1
Brick2: vm3:/extraspace/brick/2
Brick3: vm3:/extraspace/brick/3
Brick4: vm3:/extraspace/brick/4
Brick5: vm3:/extraspace/brick/5
Brick6: vm3:/extraspace/brick/6
Brick7: vm3:/extraspace/brick/7
Options Reconfigured:
performance.client-io-threads: on
transport.address-family: inet
nfs.disable: on

Error messages from remove-brick log:
ile-fpga-irq.txt lookup failed [Stale file handle]
[2018-04-27 10:35:46.072971] W [MSGID: 114031]
[client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-3: remote
operation failed [No such file or directory]
[2018-04-27 10:35:46.073518] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm6345-l1-intc.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.073661] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.073802] W [MSGID: 114031]
[client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-2: remote
operation failed [No such file or directory]
[2018-04-27 10:35:46.073874] W [MSGID: 114031]
[client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-1: remote
operation failed [No such file or directory]
[2018-04-27 10:35:46.073900] W [MSGID: 114031]
[client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-4: remote
operation failed [No such file or directory]
[2018-04-27 10:35:46.074660] W [MSGID: 114031]
[client-rpc-fops.c:1009:client3_3_setxattr_cbk] 0-test1-client-0: remote
operation failed [No such file or directory]
[2018-04-27 10:35:46.074979] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.075267] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/arm,vic.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.075768] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/atmel,aic.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.076057] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm7038-l1-intc.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.076779] E [dht-rebalance.c:3497:gf_defrag_settle_hash]
0-test1-dht: fix layout on /linux-4.16/Documentation/devicetree/bindings/media
failed
[2018-04-27 10:35:46.076794] E [MSGID: 109110]
[dht-rebalance.c:3926:gf_defrag_fix_layout] 0-test1-dht: Settle hash failed for
/linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.076957] E [MSGID: 109016]
[dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for
/linux-4.16/Documentation/devicetree/bindings/media
[2018-04-27 10:35:46.077211] E [MSGID: 109016]
[dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for
/linux-4.16/Documentation/devicetree/bindings
[2018-04-27 10:35:46.077336] E [MSGID: 109016]
[dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for
/linux-4.16/Documentation/devicetree
[2018-04-27 10:35:46.077525] E [MSGID: 109016]
[dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for
/linux-4.16/Documentation
[2018-04-27 10:35:46.077656] E [MSGID: 109016]
[dht-rebalance.c:3840:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for
/linux-4.16
[2018-04-27 10:35:46.078032] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cdns,xtensa-pic.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.078413] E [MSGID: 109023]
[dht-rebalance.c:2658:gf_defrag_migrate_single_file] 0-test1-dht: Migrate file
failed:
/linux-4.16/Documentation/devicetree/bindings/interrupt-controller/cirrus,clps711x-intc.txt
lookup failed [Stale file handle]
[2018-04-27 10:35:46.080317] I [MSGID: 109028]
[dht-rebalance.c:5088:gf_defrag_status_get] 0-test1-dht: Rebalance is failed.
Time taken is 104.00 secs
[2018-04-27 10:35:46.080337] I [MSGID: 109028]
[dht-rebalance.c:5092:gf_defrag_status_get] 0-test1-dht: Files migrated: 729,
size: 2475036, lookups: 2179, failures: 10, skipped: 0
[2018-04-27 10:35:46.082286] W [glusterfsd.c:1367:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7e25) [0x7f8368341e25]
-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xde) [0x40a3cb]
-->/usr/local/sbin/glusterfs(cleanup_and_exit+0x88) [0x408875] ) 0-: received
signum (15), shutting down

--- Additional comment from Susant Kumar Palai on 2018-04-27 07:08:47 EDT ---

There are good enough checks in the fix-layout code path to eliminate ENOENT
and ESTALE errors. But the same was missing from gf_defrag_settle_hash
function. Since the the directory in question is deleted as part of rm -rf *,
settle_hash failed.

debug log snippet.
<[2018-04-27 11:00:25.437436] E [dht-rebalance.c:3620:gf_defrag_settle_hash]
0-test1-dht: fix layout on
/linux-4.16/Documentation/devicetree/bindings/display/panel failed, error :2>

--- Additional comment from Worker Ant on 2018-04-27 07:14:48 EDT ---

REVIEW: https://review.gluster.org/19945 (dht: gf_defrag_settle_hash should
ignore ENOENT and ESTALE error) posted (#2) for review on master by Susant
Palai


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1572581
[Bug 1572581] Remove-brick failed on Distributed volume while rm -rf is
in-progress
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=lPtz1GYpah&a=cc_unsubscribe


More information about the Bugs mailing list