[Bugs] [Bug 1806551] New: remove-brick hiding file content

bugzilla at redhat.com bugzilla at redhat.com
Mon Feb 24 14:06:46 UTC 2020


https://bugzilla.redhat.com/show_bug.cgi?id=1806551

            Bug ID: 1806551
           Summary: remove-brick hiding file content
           Product: GlusterFS
           Version: 6
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: disperse
          Assignee: bugs at gluster.org
          Reporter: g.amedick at uni-luebeck.de
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Description of problem:

Hi,

I have a GlusterFS distributed dispersed volume (20 x (4 + 2)) with 9
productive servers and tried to remove a subvolume (disperse-12). I used the
command 

gluster volume remove-brick OMICS $bricks

When I started the rebalance, all bricks of the volume were somewhere between
60 and 70% full.
When I returned to office on Monday, the bricks of the subvolume I wanted to
remove were at 7%. All other subvolumes that are located on the same servers as
disperse-12 had dropped, too. Some of them were down to 33%. The bricks of
three other servers were hitting 90%. The bricks of the last three servers
still are at about 65%, give or take a bit. The status of the remove-brick
command showed ~50.000 failures per server. And I found a LOT of log entries
like this:

[2020-02-17 10:02:47.971011] I [dht-rebalance.c:1589:dht_migrate_file]
0-OMICS-dht: $FILE: attempting to move from OMICS-disperse-0 to OMICS-disperse-
10
[2020-02-17 10:02:47.997915] W [MSGID: 0]
[dht-rebalance.c:1026:__dht_check_free_space] 0-OMICS-dht: Write will cross
min-free-disk for file - $FILE
on subvol - OMICS-disperse-10. Looking for new subvol
[2020-02-17 10:02:47.997970] I [MSGID: 0]
[dht-rebalance.c:1082:__dht_check_free_space] 0-OMICS-dht: new target found -
OMICS-disperse-1 for file -
$FILE
[2020-02-17 10:02:48.192873] I [MSGID: 0]
[dht-rebalance.c:1788:dht_migrate_file] 0-OMICS-dht: destination for file -
$FILE is changed to - OMICS-
disperse-1
[2020-02-17 10:02:48.407606] E [MSGID: 109023]
[dht-rebalance.c:2055:dht_migrate_file] 0-OMICS-dht: failed to set xattr on
$FILE in OMICS-disperse-10
[Operation not supported]
[2020-02-17 10:02:48.414374] E [MSGID: 109023]
[dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-OMICS-dht: migrate-data
failed for $FILE
[Operation not supported]

disperse-10 is one of the subvolumes who had hit 90%.


If I look for $FILE on the bricks of the first server, I find copies on both
subvol disperse-0 and subvol disperse-1, and those on subvol disperse-1 look
weird (brick
0100 and 0101 belong to subvol disperse-0, brick 0102 and 0103 are part of
subvol disperse-1):

# ls -lah $BRICKS/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0100/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0101/$FILE
-rw-r--r-- 2 $USER $GROUP    0 Feb 17 11:02 $BRICK0102/$FILE
-rw-r--r-- 2 $USER $GROUP    0 Feb 17 11:02 $BRICK0103/$FILE

The copy on bricks 0102 and 0103 look broken to me. The files have no size, no
content but wrong permissions for a linkfile.

When I look at the files from client side, some still have content. But some
are empty and report a file size of 0.


Version-Release number of selected component (if applicable):
GlusterFS 6.6-1 (installed via Gluster deb mirror on Debian Stretch)


How reproducible:
Didn't dare to try again.

Actual results:
I suddenly have files who have lost their content and bricks that are hitting
90% while others have plenty of room.

Expected results:
All files should remain the way they were.

Additional info:

More of a question, really. Is it possible to restore the file content?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list