[Gluster-users] recovering from a replace-brick gone wrong

Danny Webb Danny.Webb at thehutgroup.com
Tue Jul 25 15:25:29 UTC 2017


Hi All,

I have a 4 node cluster with a 4 brick distribute replica 2 volume on
it running version 3.9.0-2 on CentOS 7.  I use the cluster to provide
shared volumes in a virtual environment as our storage only serves
block storage.

For some reason I decided to make the bricks for this volume directly
on the block device rather than abstracting with LVM for easy space
management.  The bricks have surpassed 90% utilisation and we have
started seeing load increase on one of the clients and two of the nodes
/ bricks most likely due to DHT lookups.

In an effort to rework the bricks and migrate them to LVM backed mounts
I issued a replace-brick command to deprecate out the direct mount to a
 new empty LVM mount.  Immediately after I issued this command load
jumped to ~150 on the two clients (high throughput apache servers) even
though CPU utilisation was minimal.  I could see the clients logging
loads regarding meta-data self heals:

[2017-07-23 20:38:04.803241] I [MSGID: 108026] [afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on b44e9eb5-f886-4222-940d-5
93866c210ff. sources=[0]  sinks=
[2017-07-23 20:38:04.803736] I [MSGID: 108026] [afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on 97ad48bc-8873-4700-9f82-4
7130cd031a1. sources=[0]  sinks=
[2017-07-23 20:38:04.837770] I [MSGID: 108026] [afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on 041e58e2-9afe-4f43-ba0b-d
11e80b5053b. sources=[0]  sinks=

In order to "fix" the situation I had to kill off the new brick process
so now I'm in a situation with a distribute replica volume with one of
the replica sets in a degraded state.

Is there any way I can re-add the old brick back to get back to a
normal working state?  If I do this migration again I'd probably just
do a direct dd of the ext4 file system onto the new mount while the
brick was offline.

Cheers,

Danny
Danny Webb
Senior Linux and Virtualisation Engineer
The Hut Group<http://www.thehutgroup.com/>

Tel:
Email: Danny.Webb at thehutgroup.com<mailto:Danny.Webb at thehutgroup.com>

For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Meridian House, Gadbrook Park, Rudheath, Northwich, Cheshire, CW9 7RA and/or any of its respective subsidiaries.

Confidentiality Notice
This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 338197 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company.

Encryptions and Viruses
Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail.

Monitoring
Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes.

hgvyjuv


More information about the Gluster-users mailing list