[Bugs] [Bug 1806551] New: remove-brick hiding file content
bugzilla at redhat.com
bugzilla at redhat.com
Mon Feb 24 14:06:46 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1806551
Bug ID: 1806551
Summary: remove-brick hiding file content
Product: GlusterFS
Version: 6
Hardware: x86_64
OS: Linux
Status: NEW
Component: disperse
Assignee: bugs at gluster.org
Reporter: g.amedick at uni-luebeck.de
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Hi,
I have a GlusterFS distributed dispersed volume (20 x (4 + 2)) with 9
productive servers and tried to remove a subvolume (disperse-12). I used the
command
gluster volume remove-brick OMICS $bricks
When I started the rebalance, all bricks of the volume were somewhere between
60 and 70% full.
When I returned to office on Monday, the bricks of the subvolume I wanted to
remove were at 7%. All other subvolumes that are located on the same servers as
disperse-12 had dropped, too. Some of them were down to 33%. The bricks of
three other servers were hitting 90%. The bricks of the last three servers
still are at about 65%, give or take a bit. The status of the remove-brick
command showed ~50.000 failures per server. And I found a LOT of log entries
like this:
[2020-02-17 10:02:47.971011] I [dht-rebalance.c:1589:dht_migrate_file]
0-OMICS-dht: $FILE: attempting to move from OMICS-disperse-0 to OMICS-disperse-
10
[2020-02-17 10:02:47.997915] W [MSGID: 0]
[dht-rebalance.c:1026:__dht_check_free_space] 0-OMICS-dht: Write will cross
min-free-disk for file - $FILE
on subvol - OMICS-disperse-10. Looking for new subvol
[2020-02-17 10:02:47.997970] I [MSGID: 0]
[dht-rebalance.c:1082:__dht_check_free_space] 0-OMICS-dht: new target found -
OMICS-disperse-1 for file -
$FILE
[2020-02-17 10:02:48.192873] I [MSGID: 0]
[dht-rebalance.c:1788:dht_migrate_file] 0-OMICS-dht: destination for file -
$FILE is changed to - OMICS-
disperse-1
[2020-02-17 10:02:48.407606] E [MSGID: 109023]
[dht-rebalance.c:2055:dht_migrate_file] 0-OMICS-dht: failed to set xattr on
$FILE in OMICS-disperse-10
[Operation not supported]
[2020-02-17 10:02:48.414374] E [MSGID: 109023]
[dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-OMICS-dht: migrate-data
failed for $FILE
[Operation not supported]
disperse-10 is one of the subvolumes who had hit 90%.
If I look for $FILE on the bricks of the first server, I find copies on both
subvol disperse-0 and subvol disperse-1, and those on subvol disperse-1 look
weird (brick
0100 and 0101 belong to subvol disperse-0, brick 0102 and 0103 are part of
subvol disperse-1):
# ls -lah $BRICKS/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0100/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0101/$FILE
-rw-r--r-- 2 $USER $GROUP 0 Feb 17 11:02 $BRICK0102/$FILE
-rw-r--r-- 2 $USER $GROUP 0 Feb 17 11:02 $BRICK0103/$FILE
The copy on bricks 0102 and 0103 look broken to me. The files have no size, no
content but wrong permissions for a linkfile.
When I look at the files from client side, some still have content. But some
are empty and report a file size of 0.
Version-Release number of selected component (if applicable):
GlusterFS 6.6-1 (installed via Gluster deb mirror on Debian Stretch)
How reproducible:
Didn't dare to try again.
Actual results:
I suddenly have files who have lost their content and bricks that are hitting
90% while others have plenty of room.
Expected results:
All files should remain the way they were.
Additional info:
More of a question, really. Is it possible to restore the file content?
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list