[Bugs] [Bug 1322145] New: Glusterd fails to restart after replacing a failed GlusterFS node and a volume has a snapshot

Tue Mar 29 21:36:27 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1322145

            Bug ID: 1322145
           Summary: Glusterd fails to restart after replacing a failed
                    GlusterFS node and a volume has a snapshot
           Product: GlusterFS
           Version: 3.7.8
         Component: snapshot
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: ben at apcera.com
                CC: bugs at gluster.org

Description of problem:

Glusterd fails to restart after replacing one cluster member and a volume has
snapshots. Volume is 3 replica, striped and distributed and runs on a 6 node
cluster. 

Take the new host, stop and start gluster.  Gluster will be unable to start
because it expects to be able to mount a LVM snapshot that doesn't exist
locally.

Version-Release number of selected component (if applicable):
3.7.8

How reproducible:
1:1

Steps to Reproduce:
1. gluster volume replace-brick $vol $failed_peer $new_peer:$new_brick commit
force
2. stop the gluster daemons on a host 
3.

Actual results:
Gluster startup fails:
The message "I [MSGID: 106498]
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0" repeated 4 times between [2016-03-02 21:54:25.326406] and
[2016-03-02 21:54:25.369229]
[2016-03-02 21:54:28.226630] I [MSGID: 106544]
[glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
7a79537b-4389-4e04-93f9-275fc438268b
[2016-03-02 21:54:28.227741] E [MSGID: 106187]
[glusterd-store.c:3310:glusterd_resolve_snap_bricks] 0-management: resolve
brick failed in restore
[2016-03-02 21:54:28.227770] E [MSGID: 106186]
[glusterd-store.c:4297:glusterd_resolve_all_bricks] 0-management: resolving the
snap bricks failed for snap: apcfs-default_GMT-2016.02.26-15.42.14
[2016-03-02 21:54:28.227853] E [MSGID: 101019] [xlator.c:433:xlator_init]
0-management: Initialization of volume 'management' failed, review your volfile
again
[2016-03-02 21:54:28.227877] E [graph.c:322:glusterfs_graph_init] 0-management:
initializing translator failed
[2016-03-02 21:54:28.227895] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2016-03-02 21:54:28.233362] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xcd) [0x7f056f2da1fd]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7f056f2da0d6]
-->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f056f2d9709] ) 0-: received
signum (0), shutting down

Expected results:
Glusterd should start. 

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.