remove-brick seems to delete file content

Gudrun Mareike Amedick g.amedick at uni-luebeck.de
Mon Feb 17 11:05:09 UTC 2020


I'm currently removing a few bricks from a distributed dispersed volume using gluster volume remove-brick, I'm running GLusterFS 6.6. It triggered a
rebalance that is supposed to remove the data from the bricks. Today in the morning, it had ~50.000 failures on each server. I found a whole bunch of
log entries like this:

[2020-02-17 10:02:47.971011] I [dht-rebalance.c:1589:dht_migrate_file] 0-OMICS-dht: $FILE: attempting to move from OMICS-disperse-0 to OMICS-disperse-
[2020-02-17 10:02:47.997915] W [MSGID: 0] [dht-rebalance.c:1026:__dht_check_free_space] 0-OMICS-dht: Write will cross min-free-disk for file - $FILE
on subvol - OMICS-disperse-10. Looking for new subvol
[2020-02-17 10:02:47.997970] I [MSGID: 0] [dht-rebalance.c:1082:__dht_check_free_space] 0-OMICS-dht: new target found - OMICS-disperse-1 for file -
[2020-02-17 10:02:48.192873] I [MSGID: 0] [dht-rebalance.c:1788:dht_migrate_file] 0-OMICS-dht: destination for file - $FILE is changed to - OMICS-
[2020-02-17 10:02:48.407606] E [MSGID: 109023] [dht-rebalance.c:2055:dht_migrate_file] 0-OMICS-dht: failed to set xattr on $FILE in OMICS-disperse-10
[Operation not supported]
[2020-02-17 10:02:48.414374] E [MSGID: 109023] [dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-OMICS-dht: migrate-data failed for $FILE
[Operation not supported]

The bricks for subvol disperse-10 have indeed hit 90% during the rebalance. subvol disperse-1 is way lower.

If I look for $FILE on the bricks, I find copies on both subvol disperse-0 and subvol disperse-1, and those on subvol disperse-1 look weird (brick
0100 and 0101 belong to subvol disperse-0, brick 0102 and 0103 are part of subvol disperse-1):

# ls -lah $BRICKS/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0100/$FILE
-rw-r--r-- 2 $USER $GROUP 3.5K Feb 13 07:47 $BRICK0101/$FILE
-rw-r--r-- 2 $USER $GROUP    0 Feb 17 11:02 $BRICK0102/$FILE
-rw-r--r-- 2 $USER $GROUP    0 Feb 17 11:02 $BRICK0103/$FILE

This doesn't look like a linkfile.

Some of those files are empty on client side, some aren't. But since those aren't my files, I can't tell for sure whether they are supposed to look
empty. The empty ones report a file size of 0 (du -h $FILE) from client side, but they do have a size (and content) on server side in their original
subvolume, so I'm guessing they shouldn'd be empty :(

I stopped the remove-brick operation, this looked weird. Is this supposed to happen? Or is the reblance screwing up when trying to move things to a
brick that's already full? 
I'm removing the subvolume disperse-12. Is it intended that data from subvol disperse-0 is being moved?
Should I open a bug report? 
And, most importantly, are those weird non-linkfile-but-empty-files going to be a problem and if yes, how do I get rid of them safely? Can I restore
the content of those files that are currently shown as empty?

Thanks in advance and kind regards,

Gudrun Amedick
