[Bugs] [Bug 1322145] Glusterd fails to restart after replacing a failed GlusterFS node and a volume has a snapshot

Thu Mar 16 20:44:15 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1322145

--- Comment #21 from Ben Werthmann <ben at apcera.com> ---
The reported case differs slightly from problem statement in 16907.

Problem:

 - Deploy gluster on 3 nodes, one brick each, one volume replicated, add peers
by IP address
 - Create a snapshot
 - Lose one server
 - Add a replacement peer and new brick with a new IP address
 - replace-brick the missing brick onto the new server (wait for replication to
finish)
 - peer detach the old server
 - after doing above steps, glusterd fails to restart.

Our expectations:

 - Glusterd starts and remain started when in the above state, even if there
are issues with a volume and/or related snapshots. [1]
 - The non-snapshot volume would start and a heal would kick off (to catch up
from where 'replace-brick <volumename> <oldbrick> <newbrick> commit force' left
off)
 - A procedure exists which allows for replacing a failed server / brick and
allows us to maintain snapshots. [2]
 - Limitations of snapshots are documented, such as having to delete all of
your snapshots to replace a failed node. Is this true in all cases or it is
something specifc that we are doing?

While recovering snapshots should be possible, our priority is [1]. Having
gluster enter a state where *none* of the glusterd processes can start is a
significant risk. Functional glusterd should be able to service/start the
primary volume, even if the snapshot volumes enter an unusable state. 

 [1] Should issues with one volume prevent other volumes from starting due to
glusterd crashing? Is this by design? Please elaborate on this behavior. If a
well meaning individual restarts the gluster services or reboots gluster
servers for troubleshooting, there would be a cluster wide-outage of glusterd
which implies no new client connections.

 [2] In theory, it should be possible to recover Gluster snapshots based on
lvm-thin. I think we'd just need to "replay the snapshot history" on a new
thin-lv. The process could  be something like:
   1. Create a new thin-LV
   2. replace-brick the oldest snapshot, create a LVM snapshot, update gluster
references for the snapshot volume to the new snapshot 
   4. goto next snapshot unless head

This is probably A gross oversimplification of the problem, but it seems that
recovering snapshots should be possible.

Aside comment on 16907:

 - What's the recovery path when a peer has failed and something is preventing
the removal of snapshots[3] before replacing the brick? Generally when we need
to perform the "recover a failed gluster server" task something else has gone
wrong.

[3] different reasons where snapshot remove failed:
- Bugs with with dm-thin/lvmtools
- Gluster is not responding to "tpool_metadata is at low water mark" events,
leading to a thinpool wedged in a read-only state.
- poor interaction with netfilter's conntrack

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=OQGbQxz6N3&a=cc_unsubscribe