[Bugs] [Bug 1346854] New: Disk failed, can't (cleanly) remove brick.
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jun 15 13:12:47 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1346854
Bug ID: 1346854
Summary: Disk failed, can't (cleanly) remove brick.
Product: GlusterFS
Version: 3.7.11
Component: core
Severity: medium
Assignee: bugs at gluster.org
Reporter: phil at solidstatescientific.com
CC: bugs at gluster.org
Created attachment 1168372
--> https://bugzilla.redhat.com/attachment.cgi?id=1168372&action=edit
compressed (bzip2) tarball of /var/log/glusterfs on server with failed disk
Description of problem:
The following is cut-n-paste from an email to gluster-users at gluster.org. I
received a reply saying it looked like a bug and would I please submit a BZ.
So here it is.
---- vvvv ---- Begin email cut-n-past ---- vvvv ----
Just started trying gluster, to decide if we want to put it into production.
Running version 3.7.11-1
Replicated, distributed volume, two servers, 20 bricks per server:
[root at storinator1 ~]# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick storinator1:/export/brick1/gv0 49153 0 Y 2554
Brick storinator2:/export/brick1/gv0 49153 0 Y 9686
Brick storinator1:/export/brick2/gv0 49154 0 Y 2562
Brick storinator2:/export/brick2/gv0 49154 0 Y 9708
Brick storinator1:/export/brick3/gv0 49155 0 Y 2568
Brick storinator2:/export/brick3/gv0 49155 0 Y 9692
Brick storinator1:/export/brick4/gv0 49156 0 Y 2574
Brick storinator2:/export/brick4/gv0 49156 0 Y 9765
Brick storinator1:/export/brick5/gv0 49173 0 Y 16901
Brick storinator2:/export/brick5/gv0 49173 0 Y 9727
Brick storinator1:/export/brick6/gv0 49174 0 Y 16920
Brick storinator2:/export/brick6/gv0 49174 0 Y 9733
Brick storinator1:/export/brick7/gv0 49175 0 Y 16939
Brick storinator2:/export/brick7/gv0 49175 0 Y 9739
Brick storinator1:/export/brick8/gv0 49176 0 Y 16958
Brick storinator2:/export/brick8/gv0 49176 0 Y 9703
Brick storinator1:/export/brick9/gv0 49177 0 Y 16977
Brick storinator2:/export/brick9/gv0 49177 0 Y 9713
Brick storinator1:/export/brick10/gv0 49178 0 Y 16996
Brick storinator2:/export/brick10/gv0 49178 0 Y 9718
Brick storinator1:/export/brick11/gv0 49179 0 Y 17015
Brick storinator2:/export/brick11/gv0 49179 0 Y 9746
Brick storinator1:/export/brick12/gv0 49180 0 Y 17034
Brick storinator2:/export/brick12/gv0 49180 0 Y 9792
Brick storinator1:/export/brick13/gv0 49181 0 Y 17053
Brick storinator2:/export/brick13/gv0 49181 0 Y 9755
Brick storinator1:/export/brick14/gv0 49182 0 Y 17072
Brick storinator2:/export/brick14/gv0 49182 0 Y 9767
Brick storinator1:/export/brick15/gv0 49183 0 Y 17091
Brick storinator2:/export/brick15/gv0 N/A N/A N N/A
Brick storinator1:/export/brick16/gv0 49184 0 Y 17110
Brick storinator2:/export/brick16/gv0 49184 0 Y 9791
Brick storinator1:/export/brick17/gv0 49185 0 Y 17129
Brick storinator2:/export/brick17/gv0 49185 0 Y 9756
Brick storinator1:/export/brick18/gv0 49186 0 Y 17148
Brick storinator2:/export/brick18/gv0 49186 0 Y 9766
Brick storinator1:/export/brick19/gv0 49187 0 Y 17167
Brick storinator2:/export/brick19/gv0 49187 0 Y 9745
Brick storinator1:/export/brick20/gv0 49188 0 Y 17186
Brick storinator2:/export/brick20/gv0 49188 0 Y 9783
NFS Server on localhost 2049 0 Y 17206
Self-heal Daemon on localhost N/A N/A Y 17214
NFS Server on storinator2 2049 0 Y 9657
Self-heal Daemon on storinator2 N/A N/A Y 9677
Task Status of Volume gv0
------------------------------------------------------------------------------
Task : Rebalance
ID : 28c733e9-d618-44fc-873f-405d3b29a609
Status : completed
Wouldn't you know it, within a week or two of pulling the hardware together and
getting gluster installed and configured, a disk dies. Note the dead process
for brick15 on server storinator2.
I would like to remove (not replace) the failed brick (and its replica). (I
don't have a spare disk handy, and there's plenty of room on the other bricks.)
But gluster doesn't seem to want to remove a brick if the brick is dead:
[root at storinator1 ~]# gluster volume remove-brick gv0
storinator{1..2}:/export/brick15/gv0 start
volume remove-brick start: failed: Staging failed on storinator2. Error: Found
stopped brick storinator2:/export/brick15/gv0
So what do I do? I can't remove the brick while the brick is bad, but I want
to remove the brick *because* the brick is bad. Bit of a Catch-22.
Thanks in advance for any help you can give.
---- ^^^^ ---- End email cut-n-past ---- ^^^^ ----
Version-Release number of selected component (if applicable):
Huh?
How reproducible:
I haven't a clue how readily reproducible this is.
Steps to Reproduce:
1. Create a volume like the one described above.
2. Wait for a disk to fail. (You will, of course, want to force/fake a disk
failure, if there's a way to do so.)
3. Attempt to remove the failed brick and its replica
Actual results:
Can't (cleanly) remove failed brick and its replica
Expected results:
Can (cleanly) remove failed brick and its replica
Additional info:
No idea if I got the component right. A guess based on my very limited
understanding of gluster architecture.
Got the job done in a roundabout way. Tried a remove-brick force. This
worked, but of course resulted in the data on the removed brick being gone from
the volume. But since the replica brick was still sound, I was able to copy
the removed replica's contents to the gluster volume mount point. This
cumbersome but effective workaround was the reason I did not select a higher
Severity for this bug report.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list