[Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

Gambit15 dougti+gluster at gmail.com
Mon Feb 20 13:21:40 UTC 2017


Hi Avra,

On 20 February 2017 at 02:51, Avra Sengupta <asengupt at redhat.com> wrote:

> Hi D,
>
> It seems you tried to take a clone of a snapshot, when that snapshot was
> not activated.
>

Correct. As per my commands, I then noticed the issue, checked the
snapshot's status & activated it. I included this in my command history
just to clear up any doubts from the logs.

However in this scenario, the cloned volume should not be in an
> inconsistent state. I will try to reproduce this and see if it's a bug.
> Meanwhile could you please answer the following queries:
> 1. How many nodes were in the cluster.
>

There are 4 nodes in a (2+1)x2 setup.
s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3, with
an arbiter on s0.

2. How many bricks does the snapshot data-bck_GMT-2017.02.09-14.15.43 have?
>

6 bricks, including the 2 arbiters.


> 3. Was the snapshot clone command issued from a node which did not have
> any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43
>

All commands were issued from s0. All volumes have bricks on every node in
the cluster.


> 4. I see you tried to delete the new cloned volume. Did the new cloned
> volume land in this state after failure to create the clone or failure to
> delete the clone
>

I noticed there was something wrong as soon as I created the clone. The
clone command completed, however I was then unable to do anything with it
because the clone didn't exist on s1-s3.


>
> If you want to remove the half baked volume from the cluster please
> proceed with the following steps.
> 1. bring down glusterd on all nodes by running the following command on
> all nodes
> $ systemctl stop glusterd.
> Verify that the glusterd is down on all nodes by running the following
> command on all nodes
> $ systemctl status glusterd.
> 2. delete the following repo from all the nodes (whichever nodes it exists)
> /var/lib/glusterd/vols/data-teste
>

The repo only exists on s0, but stoppping glusterd on only s0 & deleting
the directory didn't work, the directory was restored as soon as glusterd
was restarted. I haven't yet tried stopping glusterd on *all* nodes before
doing this, although I'll need to plan for that, as it'll take the entire
cluster off the air.

Thanks for the reply,
 Doug


> Regards,
> Avra
>
>
> On 02/16/2017 08:01 PM, Gambit15 wrote:
>
> Hey guys,
>  I tried to create a new volume from a cloned snapshot yesterday, however
> something went wrong during the process & I'm now stuck with the new volume
> being created on the server I ran the commands on (s0), but not on the rest
> of the peers. I'm unable to delete this new volume from the server, as it
> doesn't exist on the peers.
>
> What do I do?
> Any insights into what may have gone wrong?
>
> CentOS 7.3.1611
> Gluster 3.8.8
>
> The command history & extract from etc-glusterfs-glusterd.vol.log are
> included below.
>
> gluster volume list
> gluster snapshot list
> gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
> gluster volume status data-teste
> gluster volume delete data-teste
> gluster snapshot create teste data
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
> gluster snapshot status
> gluster snapshot activate teste_GMT-2017.02.15-12.44.04
> gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
>
>
> [2017-02-15 12:43:21.667403] I [MSGID: 106499] [glusterd-handler.c:4349:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume data-teste
> [2017-02-15 12:43:21.682530] E [MSGID: 106301] [glusterd-syncop.c:1297:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Status' failed on localhost :
> Volume data-teste is not started
> [2017-02-15 12:43:43.633031] I [MSGID: 106495] [glusterd-handler.c:3128:__glusterd_handle_getwd]
> 0-glusterd: Received getwd req
> [2017-02-15 12:43:43.640597] I [run.c:191:runner_log]
> (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2)
> [0x7ffb396a14b2] -->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65)
> [0x7ffb396a0f65] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7ffb44ec31c5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/
> delete/post/S57glusterfind-delete-post --volname=data-teste
> [2017-02-15 13:05:20.103423] E [MSGID: 106122] [glusterd-snapshot.c:2397:
> glusterd_snapshot_clone_prevalidate] 0-management: Failed to pre validate
> [2017-02-15 13:05:20.103464] E [MSGID: 106443] [glusterd-snapshot.c:2413:
> glusterd_snapshot_clone_prevalidate] 0-management: One or more bricks are
> not running. Please run snapshot status command to see brick status.
> Please start the stopped brick and then issue snapshot clone command
> [2017-02-15 13:05:20.103481] W [MSGID: 106443] [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate]
> 0-management: Snapshot clone pre-validation failed
> [2017-02-15 13:05:20.103492] W [MSGID: 106122]
> [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot
> Prevalidate Failed
> [2017-02-15 13:05:20.103503] E [MSGID: 106122]
> [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate] 0-management: Pre
> Validation failed for operation Snapshot on local node
> [2017-02-15 13:05:20.103514] E [MSGID: 106122] [glusterd-mgmt.c:2243:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed
> [2017-02-15 13:05:20.103531] E [MSGID: 106027] [glusterd-snapshot.c:8118:
> glusterd_snapshot_clone_postvalidate] 0-management: unable to find clone
> data-teste volinfo
> [2017-02-15 13:05:20.103542] W [MSGID: 106444] [glusterd-snapshot.c:9063:
> glusterd_snapshot_postvalidate] 0-management: Snapshot create
> post-validation failed
> [2017-02-15 13:05:20.103561] W [MSGID: 106121]
> [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management:
> postvalidate operation failed
> [2017-02-15 13:05:20.103572] E [MSGID: 106121] [glusterd-mgmt.c:1660:
> glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for
> operation Snapshot on local node
> [2017-02-15 13:05:20.103582] E [MSGID: 106122] [glusterd-mgmt.c:2363:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation
> Failed
> [2017-02-15 13:11:15.862858] W [MSGID: 106057] [glusterd-snapshot-utils.c:
> 410:glusterd_snap_volinfo_find] 0-management: Snap volume
> c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-
> c3ceae3889484e96ab8bed69593cf6d3-brick1-data-brick not found [Argumento
> inválido]
> [2017-02-15 13:11:16.314759] I [MSGID: 106143] [glusterd-pmap.c:250:pmap_registry_bind]
> 0-pmap: adding brick /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
> on port 49452
> [2017-02-15 13:11:16.316090] I [rpc-clnt.c:1046:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2017-02-15 13:11:16.348867] W [MSGID: 106057] [glusterd-snapshot-utils.c:
> 410:glusterd_snap_volinfo_find] 0-management: Snap volume
> c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-
> c3ceae3889484e96ab8bed69593cf6d3-brick6-data-arbiter not found [Argumento
> inválido]
> [2017-02-15 13:11:16.558878] I [MSGID: 106143] [glusterd-pmap.c:250:pmap_registry_bind]
> 0-pmap: adding brick /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
> on port 49453
> [2017-02-15 13:11:16.559883] I [rpc-clnt.c:1046:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2017-02-15 13:11:23.279721] E [MSGID: 106030] [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
> 0-management: taking snapshot of the brick (/run/gluster/snaps/
> c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick) of device
> /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_0 failed
> [2017-02-15 13:11:23.279790] E [MSGID: 106030] [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
> 0-management: Failed to take snapshot of brick s0:/run/gluster/snaps/
> c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
> [2017-02-15 13:11:23.279806] E [MSGID: 106030] [glusterd-snapshot.c:6484:
> glusterd_take_brick_snapshot_task] 0-management: Failed to take backend
> snapshot for brick s0:/run/gluster/snaps/data-teste/brick1/data/brick
> volume(data-teste)
> [2017-02-15 13:11:23.286678] E [MSGID: 106030] [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
> 0-management: taking snapshot of the brick (/run/gluster/snaps/
> c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter) of device
> /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_1 failed
> [2017-02-15 13:11:23.286735] E [MSGID: 106030] [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
> 0-management: Failed to take snapshot of brick s0:/run/gluster/snaps/
> c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
> [2017-02-15 13:11:23.286749] E [MSGID: 106030] [glusterd-snapshot.c:6484:
> glusterd_take_brick_snapshot_task] 0-management: Failed to take backend
> snapshot for brick s0:/run/gluster/snaps/data-teste/brick6/data/arbiter
> volume(data-teste)
> [2017-02-15 13:11:23.286793] E [MSGID: 106030] [glusterd-snapshot.c:6626:
> glusterd_schedule_brick_snapshot] 0-management: Failed to create snapshot
> [2017-02-15 13:11:23.286813] E [MSGID: 106441] [glusterd-snapshot.c:6796:
> glusterd_snapshot_clone_commit] 0-management: Failed to take backend
> snapshot data-teste
> [2017-02-15 13:11:25.530666] E [MSGID: 106442] [glusterd-snapshot.c:8308:glusterd_snapshot]
> 0-management: Failed to clone snapshot
> [2017-02-15 13:11:25.530721] W [MSGID: 106123]
> [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit
> Failed
> [2017-02-15 13:11:25.530735] E [MSGID: 106123] [glusterd-mgmt.c:1427:glusterd_mgmt_v3_commit]
> 0-management: Commit failed for operation Snapshot on local node
> [2017-02-15 13:11:25.530749] E [MSGID: 106123] [glusterd-mgmt.c:2304:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
> [2017-02-15 13:11:25.532312] E [MSGID: 106027] [glusterd-snapshot.c:8118:
> glusterd_snapshot_clone_postvalidate] 0-management: unable to find clone
> data-teste volinfo
> [2017-02-15 13:11:25.532339] W [MSGID: 106444] [glusterd-snapshot.c:9063:
> glusterd_snapshot_postvalidate] 0-management: Snapshot create
> post-validation failed
> [2017-02-15 13:11:25.532353] W [MSGID: 106121]
> [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management:
> postvalidate operation failed
> [2017-02-15 13:11:25.532367] E [MSGID: 106121] [glusterd-mgmt.c:1660:
> glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for
> operation Snapshot on local node
> [2017-02-15 13:11:25.532381] E [MSGID: 106122] [glusterd-mgmt.c:2363:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation
> Failed
> [2017-02-15 13:29:53.779020] E [MSGID: 106062] [glusterd-snapshot-utils.c:
> 2391:glusterd_snap_create_use_rsp_dict] 0-management: failed to get snap
> UUID
> [2017-02-15 13:29:53.779073] E [MSGID: 106099] [glusterd-snapshot-utils.c:
> 2507:glusterd_snap_use_rsp_dict] 0-glusterd: Unable to use rsp dict
> [2017-02-15 13:29:53.779096] E [MSGID: 106108]
> [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management: Failed to
> aggregate response from  node/brick
> [2017-02-15 13:29:53.779136] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit
> failed on s3. Please check log file for details.
> [2017-02-15 13:29:54.136196] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit
> failed on s1. Please check log file for details.
> The message "E [MSGID: 106108] [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn]
> 0-management: Failed to aggregate response from  node/brick" repeated 2
> times between [2017-02-15 13:29:53.779096] and [2017-02-15 13:29:54.535080]
> [2017-02-15 13:29:54.535098] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit
> failed on s2. Please check log file for details.
> [2017-02-15 13:29:54.535320] E [MSGID: 106123] [glusterd-mgmt.c:1490:glusterd_mgmt_v3_commit]
> 0-management: Commit failed on peers
> [2017-02-15 13:29:54.535370] E [MSGID: 106123] [glusterd-mgmt.c:2304:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
> [2017-02-15 13:29:54.539708] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post
> Validation failed on s1. Please check log file for details.
> [2017-02-15 13:29:54.539797] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post
> Validation failed on s3. Please check log file for details.
> [2017-02-15 13:29:54.539856] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Post
> Validation failed on s2. Please check log file for details.
> [2017-02-15 13:29:54.540224] E [MSGID: 106121] [glusterd-mgmt.c:1713:
> glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed on
> peers
> [2017-02-15 13:29:54.540256] E [MSGID: 106122] [glusterd-mgmt.c:2363:
> glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation
> Failed
> The message "E [MSGID: 106062] [glusterd-snapshot-utils.c:
> 2391:glusterd_snap_create_use_rsp_dict] 0-management: failed to get snap
> UUID" repeated 2 times between [2017-02-15 13:29:53.779020] and [2017-02-15
> 13:29:54.535075]
> The message "E [MSGID: 106099] [glusterd-snapshot-utils.c:
> 2507:glusterd_snap_use_rsp_dict] 0-glusterd: Unable to use rsp dict"
> repeated 2 times between [2017-02-15 13:29:53.779073] and [2017-02-15
> 13:29:54.535078]
> [2017-02-15 13:31:14.285666] I [MSGID: 106488] [glusterd-handler.c:1537:__
> glusterd_handle_cli_get_volume] 0-management: Received get vol req
> [2017-02-15 13:32:17.827422] E [MSGID: 106027] [glusterd-handler.c:4670:glusterd_get_volume_opts]
> 0-management: Volume cluster.locking-scheme does not exist
> [2017-02-15 13:34:02.635762] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
> Validation failed on s1. Volume data-teste does not exist
> [2017-02-15 13:34:02.635838] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
> Validation failed on s2. Volume data-teste does not exist
> [2017-02-15 13:34:02.635889] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
> Validation failed on s3. Volume data-teste does not exist
> [2017-02-15 13:34:02.636092] E [MSGID: 106122]
> [glusterd-mgmt.c:947:glusterd_mgmt_v3_pre_validate] 0-management: Pre
> Validation failed on peers
> [2017-02-15 13:34:02.636132] E [MSGID: 106122] [glusterd-mgmt.c:2009:
> glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed
> [2017-02-15 13:34:20.313228] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s2. Error: Volume data-teste does not exist
> [2017-02-15 13:34:20.313320] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s1. Error: Volume data-teste does not exist
> [2017-02-15 13:34:20.313377] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s3. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796455] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s1. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796830] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s3. Error: Volume data-teste does not exist
> [2017-02-15 13:34:36.796896] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on s2. Error: Volume data-teste does not exist
>
> Many thanks!
>  D
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170220/aff4f876/attachment.html>


More information about the Gluster-users mailing list