[Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

Mon Feb 20 06:51:40 UTC 2017

Hi D,

It seems you tried to take a clone of a snapshot, when that snapshot was 
not activated. The following logs suggest the same.

**** Logs Start ****
[2017-02-15 13:05:20.103423] E [MSGID: 106122] 
[glusterd-snapshot.c:2397:glusterd_snapshot_clone_prevalidate] 
0-management: Failed to pre validate
[2017-02-15 13:05:20.103464] E [MSGID: 106443] 
[glusterd-snapshot.c:2413:glusterd_snapshot_clone_prevalidate] 
0-management: One or more bricks are not running. Please run snapshot 
status command to see brick status.
Please start the stopped brick and then issue snapshot clone command
**** Logs End ****

However in this scenario, the cloned volume should not be in an 
inconsistent state. I will try to reproduce this and see if it's a bug. 
Meanwhile could you please answer the following queries:
1. How many nodes were in the cluster.
2. How many bricks does the snapshot data-bck_GMT-2017.02.09-14.15.43 have?
3. Was the snapshot clone command issued from a node which did not have 
any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43
4. I see you tried to delete the new cloned volume. Did the new cloned 
volume land in this state after failure to create the clone or failure 
to delete the clone

If you want to remove the half baked volume from the cluster please 
proceed with the following steps.
1. bring down glusterd on all nodes by running the following command on 
all nodes
$ systemctl stop glusterd.
Verify that the glusterd is down on all nodes by running the following 
command on all nodes
$ systemctl status glusterd.
2. delete the following repo from all the nodes (whichever nodes it exists)
/var/lib/glusterd/vols/data-teste

Regards,
Avra

On 02/16/2017 08:01 PM, Gambit15 wrote:
> Hey guys,
>  I tried to create a new volume from a cloned snapshot yesterday, 
> however something went wrong during the process & I'm now stuck with 
> the new volume being created on the server I ran the commands on (s0), 
> but not on the rest of the peers. I'm unable to delete this new volume 
> from the server, as it doesn't exist on the peers.
>
> What do I do?
> Any insights into what may have gone wrong?
>
> CentOS 7.3.1611
> Gluster 3.8.8
>
> The command history & extract from etc-glusterfs-glusterd.vol.log are 
> included below.
>
> gluster volume list
> gluster snapshot list
> gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
> gluster volume status data-teste
> gluster volume delete data-teste
> gluster snapshot create teste data
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
> gluster snapshot status
> gluster snapshot activate teste_GMT-2017.02.15-12.44.04
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
>
>
> [2017-02-15 12:43:21.667403] I [MSGID: 106499] 
> [glusterd-handler.c:4349:__glusterd_handle_status_volume] 
> 0-management: Received status volume req for volume data-teste
> [2017-02-15 12:43:21.682530] E [MSGID: 106301] 
> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of 
> operation 'Volume Status' failed on localhost : Volume data-teste is 
> not started
> [2017-02-15 12:43:43.633031] I [MSGID: 106495] 
> [glusterd-handler.c:3128:__glusterd_handle_getwd] 0-glusterd: Received 
> getwd req
> [2017-02-15 12:43:43.640597] I [run.c:191:runner_log] 
> (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2) 
> [0x7ffb396a14b2] 
> -->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65) 
> [0x7ffb396a0f65] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
> [0x7ffb44ec31c5] ) 0-management: Ran script: 
> /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post 
> --volname=data-teste
> [2017-02-15 13:05:20.103423] E [MSGID: 106122] 
> [glusterd-snapshot.c:2397:glusterd_snapshot_clone_prevalidate] 
> 0-management: Failed to pre validate
> [2017-02-15 13:05:20.103464] E [MSGID: 106443] 
> [glusterd-snapshot.c:2413:glusterd_snapshot_clone_prevalidate] 
> 0-management: One or more bricks are not running. Please run snapshot 
> status command to see brick status.
> Please start the stopped brick and then issue snapshot clone command
> [2017-02-15 13:05:20.103481] W [MSGID: 106443] 
> [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate] 0-management: 
> Snapshot clone pre-validation failed
> [2017-02-15 13:05:20.103492] W [MSGID: 106122] 
> [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: 
> Snapshot Prevalidate Failed
> [2017-02-15 13:05:20.103503] E [MSGID: 106122] 
> [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate] 0-management: Pre 
> Validation failed for operation Snapshot on local node
> [2017-02-15 13:05:20.103514] E [MSGID: 106122] 
> [glusterd-mgmt.c:2243:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Pre Validation Failed
> [2017-02-15 13:05:20.103531] E [MSGID: 106027] 
> [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate] 
> 0-management: unable to find clone data-teste volinfo
> [2017-02-15 13:05:20.103542] W [MSGID: 106444] 
> [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate] 
> 0-management: Snapshot create post-validation failed
> [2017-02-15 13:05:20.103561] W [MSGID: 106121] 
> [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management: 
> postvalidate operation failed
> [2017-02-15 13:05:20.103572] E [MSGID: 106121] 
> [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate] 0-management: 
> Post Validation failed for operation Snapshot on local node
> [2017-02-15 13:05:20.103582] E [MSGID: 106122] 
> [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Post Validation Failed
> [2017-02-15 13:11:15.862858] W [MSGID: 106057] 
> [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find] 
> 0-management: Snap volume 
> c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick1-data-brick 
> not found [Argumento inválido]
> [2017-02-15 13:11:16.314759] I [MSGID: 106143] 
> [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick 
> /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick 
> on port 49452
> [2017-02-15 13:11:16.316090] I 
> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
> [2017-02-15 13:11:16.348867] W [MSGID: 106057] 
> [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find] 
> 0-management: Snap volume 
> c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick6-data-arbiter 
> not found [Argumento inválido]
> [2017-02-15 13:11:16.558878] I [MSGID: 106143] 
> [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick 
> /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter on 
> port 49453
> [2017-02-15 13:11:16.559883] I 
> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting 
> frame-timeout to 600
> [2017-02-15 13:11:23.279721] E [MSGID: 106030] 
> [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot] 0-management: 
> taking snapshot of the brick 
> (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick) of 
> device /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_0 
> failed
> [2017-02-15 13:11:23.279790] E [MSGID: 106030] 
> [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot] 0-management: 
> Failed to take snapshot of brick 
> s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
> [2017-02-15 13:11:23.279806] E [MSGID: 106030] 
> [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task] 
> 0-management: Failed to take backend snapshot for brick 
> s0:/run/gluster/snaps/data-teste/brick1/data/brick volume(data-teste)
> [2017-02-15 13:11:23.286678] E [MSGID: 106030] 
> [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot] 0-management: 
> taking snapshot of the brick 
> (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter) 
> of device 
> /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_1 failed
> [2017-02-15 13:11:23.286735] E [MSGID: 106030] 
> [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot] 0-management: 
> Failed to take snapshot of brick 
> s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
> [2017-02-15 13:11:23.286749] E [MSGID: 106030] 
> [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task] 
> 0-management: Failed to take backend snapshot for brick 
> s0:/run/gluster/snaps/data-teste/brick6/data/arbiter volume(data-teste)
> [2017-02-15 13:11:23.286793] E [MSGID: 106030] 
> [glusterd-snapshot.c:6626:glusterd_schedule_brick_snapshot] 
> 0-management: Failed to create snapshot
> [2017-02-15 13:11:23.286813] E [MSGID: 106441] 
> [glusterd-snapshot.c:6796:glusterd_snapshot_clone_commit] 
> 0-management: Failed to take backend snapshot data-teste
> [2017-02-15 13:11:25.530666] E [MSGID: 106442] 
> [glusterd-snapshot.c:8308:glusterd_snapshot] 0-management: Failed to 
> clone snapshot
> [2017-02-15 13:11:25.530721] W [MSGID: 106123] 
> [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot 
> Commit Failed
> [2017-02-15 13:11:25.530735] E [MSGID: 106123] 
> [glusterd-mgmt.c:1427:glusterd_mgmt_v3_commit] 0-management: Commit 
> failed for operation Snapshot on local node
> [2017-02-15 13:11:25.530749] E [MSGID: 106123] 
> [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Commit Op Failed
> [2017-02-15 13:11:25.532312] E [MSGID: 106027] 
> [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate] 
> 0-management: unable to find clone data-teste volinfo
> [2017-02-15 13:11:25.532339] W [MSGID: 106444] 
> [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate] 
> 0-management: Snapshot create post-validation failed
> [2017-02-15 13:11:25.532353] W [MSGID: 106121] 
> [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management: 
> postvalidate operation failed
> [2017-02-15 13:11:25.532367] E [MSGID: 106121] 
> [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate] 0-management: 
> Post Validation failed for operation Snapshot on local node
> [2017-02-15 13:11:25.532381] E [MSGID: 106122] 
> [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Post Validation Failed
> [2017-02-15 13:29:53.779020] E [MSGID: 106062] 
> [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict] 
> 0-management: failed to get snap UUID
> [2017-02-15 13:29:53.779073] E [MSGID: 106099] 
> [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict] 
> 0-glusterd: Unable to use rsp dict
> [2017-02-15 13:29:53.779096] E [MSGID: 106108] 
> [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management: Failed 
> to aggregate response from node/brick
> [2017-02-15 13:29:53.779136] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit 
> failed on s3. Please check log file for details.
> [2017-02-15 13:29:54.136196] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit 
> failed on s1. Please check log file for details.
> The message "E [MSGID: 106108] 
> [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management: Failed 
> to aggregate response from node/brick" repeated 2 times between 
> [2017-02-15 13:29:53.779096] and [2017-02-15 13:29:54.535080]
> [2017-02-15 13:29:54.535098] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit 
> failed on s2. Please check log file for details.
> [2017-02-15 13:29:54.535320] E [MSGID: 106123] 
> [glusterd-mgmt.c:1490:glusterd_mgmt_v3_commit] 0-management: Commit 
> failed on peers
> [2017-02-15 13:29:54.535370] E [MSGID: 106123] 
> [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Commit Op Failed
> [2017-02-15 13:29:54.539708] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post 
> Validation failed on s1. Please check log file for details.
> [2017-02-15 13:29:54.539797] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post 
> Validation failed on s3. Please check log file for details.
> [2017-02-15 13:29:54.539856] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post 
> Validation failed on s2. Please check log file for details.
> [2017-02-15 13:29:54.540224] E [MSGID: 106121] 
> [glusterd-mgmt.c:1713:glusterd_mgmt_v3_post_validate] 0-management: 
> Post Validation failed on peers
> [2017-02-15 13:29:54.540256] E [MSGID: 106122] 
> [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases] 
> 0-management: Post Validation Failed
> The message "E [MSGID: 106062] 
> [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict] 
> 0-management: failed to get snap UUID" repeated 2 times between 
> [2017-02-15 13:29:53.779020] and [2017-02-15 13:29:54.535075]
> The message "E [MSGID: 106099] 
> [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict] 
> 0-glusterd: Unable to use rsp dict" repeated 2 times between 
> [2017-02-15 13:29:53.779073] and [2017-02-15 13:29:54.535078]
> [2017-02-15 13:31:14.285666] I [MSGID: 106488] 
> [glusterd-handler.c:1537:__glusterd_handle_cli_get_volume] 
> 0-management: Received get vol req
> [2017-02-15 13:32:17.827422] E [MSGID: 106027] 
> [glusterd-handler.c:4670:glusterd_get_volume_opts] 0-management: 
> Volume cluster.locking-scheme does not exist
> [2017-02-15 13:34:02.635762] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre 
> Validation failed on s1. Volume data-teste does not exist
> [2017-02-15 13:34:02.635838] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre 
> Validation failed on s2. Volume data-teste does not exist
> [2017-02-15 13:34:02.635889] E [MSGID: 106116] 
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre 
> Validation failed on s3. Volume data-teste does not exist
> [2017-02-15 13:34:02.636092] E [MSGID: 106122] 
> [glusterd-mgmt.c:947:glusterd_mgmt_v3_pre_validate] 0-management: Pre 
> Validation failed on peers
> [2017-02-15 13:34:02.636132] E [MSGID: 106122] 
> [glusterd-mgmt.c:2009:glusterd_mgmt_v3_initiate_all_phases] 
> 0-management: Pre Validation Failed
> [2017-02-15 13:34:20.313228] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s2. Error: Volume data-teste does not exist
> [2017-02-15 13:34:20.313320] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s1. Error: Volume data-teste does not exist
> [2017-02-15 13:34:20.313377] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s3. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796455] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s1. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796830] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s3. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796896] E [MSGID: 106153] 
> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed 
> on s2. Error: Volume data-teste does not exist
>
> Many thanks!
>  D
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170220/131324ec/attachment.html>