[Bugs] [Bug 1322145] Glusterd fails to restart after replacing a failed GlusterFS node and a volume has a snapshot

Wed Mar 30 04:04:40 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1322145

Atin Mukherjee <amukherj at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amukherj at redhat.com,
                   |                            |ben at apcera.com
              Flags|                            |needinfo?(ben at apcera.com)

--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
(In reply to Ben Werthmann from comment #0)
> Description of problem:
> 
> Glusterd fails to restart after replacing one cluster member and a volume
> has snapshots. Volume is 3 replica, striped and distributed and runs on a 6
> node cluster. 
When you say replacing cluster member do you mean a brick or a peer? Was the
replace-brick command successful? Could you attach the complete glusterd log
file along with cmd_history.log?
> 
> Take the new host, stop and start gluster.  Gluster will be unable to start
> because it expects to be able to mount a LVM snapshot that doesn't exist
> locally.
> 
> Version-Release number of selected component (if applicable):
> 3.7.8
> 
> How reproducible:
> 1:1
> 
> Steps to Reproduce:
> 1. gluster volume replace-brick $vol $failed_peer $new_peer:$new_brick
> commit force
> 2. stop the gluster daemons on a host 
> 3.
> 
> Actual results:
> Gluster startup fails:
> The message "I [MSGID: 106498]
> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0" repeated 4 times between [2016-03-02 21:54:25.326406]
> and [2016-03-02 21:54:25.369229]
> [2016-03-02 21:54:28.226630] I [MSGID: 106544]
> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
> 7a79537b-4389-4e04-93f9-275fc438268b
> [2016-03-02 21:54:28.227741] E [MSGID: 106187]
> [glusterd-store.c:3310:glusterd_resolve_snap_bricks] 0-management: resolve
> brick failed in restore
> [2016-03-02 21:54:28.227770] E [MSGID: 106186]
> [glusterd-store.c:4297:glusterd_resolve_all_bricks] 0-management: resolving
> the snap bricks failed for snap: apcfs-default_GMT-2016.02.26-15.42.14
> [2016-03-02 21:54:28.227853] E [MSGID: 101019] [xlator.c:433:xlator_init]
> 0-management: Initialization of volume 'management' failed, review your
> volfile again
> [2016-03-02 21:54:28.227877] E [graph.c:322:glusterfs_graph_init]
> 0-management: initializing translator failed
> [2016-03-02 21:54:28.227895] E [graph.c:661:glusterfs_graph_activate]
> 0-graph: init failed
> [2016-03-02 21:54:28.233362] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xcd) [0x7f056f2da1fd]
> -->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7f056f2da0d6]
> -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7f056f2d9709] ) 0-: received
> signum (0), shutting down
> 
> Expected results:
> Glusterd should start. 
> 
> Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.