[Bugs] [Bug 1322145] Glusterd fails to restart after replacing a failed GlusterFS node and a volume has a snapshot

Wed Mar 22 18:42:13 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1322145

Gaurav Yadav <gyadav at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(gyadav at redhat.com |
                   |)                           |

--- Comment #29 from Gaurav Yadav <gyadav at redhat.com> ---
Fix is made as per #comment 8
>
> deploy gluster with three servers, one brick each, one volume replicated across all 3
- create a snapshot
- lose one server
- add a replacement peer and new brick with a new IP address
- replace-brick the missing brick onto the new server (wait for replication to
finish)
- force remove the old server
- verify everything is working as expected
- restart _any_ server in the cluster, without failure

Explanation for case mentioned in #comment 28, 

glusterd is not getting started after executing above test case because : when
replace brick command is executed, glusterd updates path of bricks in vol files
however in snap files glusterd dont change that, the reason being:- snapshot
was created at "point in time". 
Now while restarting the service, glusterd see snap vol, but while restoring it
tries to get the info of brick from the node which is already detached from the
server and info is not present hence glusterd fails to load.

Now in the fix glusterd iterates through all snapvolume's bricks and if it
finds any brick, it disallow peer to detach which is best possible solution.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Ihkby34tGP&a=cc_unsubscribe