[Bugs] [Bug 1209484] New: Unable to stop/start a volume

Tue Apr 7 13:10:04 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1209484

            Bug ID: 1209484
           Summary: Unable to stop/start a volume
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: sanandpa at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:

Had a 2*2 setup with 3 volumes - one of which was tiered, one of them had
snapshots enabled, and the other was being used for backup testing. 

The volume in question 'pluto' had snapshots enabled, had undergone snap
restores a couple of times and the build was updated as well to one of the
newer 3.7 nightlies. The volume status shows the volume has stopped - but I
have been unable to delete it, as it errors out saying the volume is still
running.

Version-Release number of selected component (if applicable):

Glusterfs 3.7 nightly glusterfs-3.7dev-0.910.git17827de.el6.x86_64

How reproducible: 1:1

Additional info:

This is what is seen in the logs: 

[2015-04-07 10:21:00.619911] I [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4360b3b5e0] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f4360b8ad05] (-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x5a0)[0x7f43568cd770]
(-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(+0x562f5)[0x7f43568532f5]
(-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x5a)[0x7f435685643a]
))))) 0-management: Ran script:
/var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=pluto
--last=no
[2015-04-07 10:21:00.631633] E [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4360b3b5e0] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f4360b8ad05] (-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x444)[0x7f43568cd614]
(-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(+0x562f5)[0x7f43568532f5]
(-->
/usr/lib64/glusterfs/3.7dev/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x5a)[0x7f435685643a]
))))) 0-management: Failed to execute script:
/var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=pluto --last=no
[2015-04-07 10:21:00.631827] I [glusterd-utils.c:1367:glusterd_service_stop]
0-management: brick already stopped
[2015-04-07 10:21:00.632307] I [glusterd-utils.c:1367:glusterd_service_stop]
0-management: brick already stopped
[2015-04-07 10:21:00.657141] E
[glusterd-volume-ops.c:2398:glusterd_stop_volume] 0-management: Failed to
notify graph change for pluto volume
[2015-04-07 10:21:00.657546] E
[glusterd-volume-ops.c:2433:glusterd_op_stop_volume] 0-management: Failed to
stop pluto volume
[2015-04-07 10:21:00.657572] E [glusterd-syncop.c:1355:gd_commit_op_phase]
0-management: Commit of operation 'Volume Stop' failed on localhost
[2015-04-07 10:21:00.657819] I [glusterd-pmap.c:271:pmap_registry_remove]
0-pmap: removing brick
/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick2/pluto/dd on port
49169
[2015-04-07 10:21:00.660557] I [glusterd-pmap.c:271:pmap_registry_remove]
0-pmap: removing brick
/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick1/pluto/dd on port
49168

[root at dhcp43-48 ~]# gluster v i

Volume Name: nash
Type: Distributed-Replicate
Volume ID: cd66179e-6fda-49cf-b40f-be930bc01f6f
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.43.48:/rhs/brick1/dd
Brick2: 10.70.42.147:/rhs/brick1/dd
Brick3: 10.70.43.48:/rhs/brick2/dd
Brick4: 10.70.42.147:/rhs/brick2/dd
Options Reconfigured:
changelog.changelog: on
storage.build-pgfid: on

Volume Name: ozone
Type: Tier
Volume ID: 4611c8ba-4f32-409c-8858-81d55d2acc75
Status: Started
Number of Bricks: 6 x 1 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.42.147:/rhs/thinbrick1/ozone/hdd
Brick2: 10.70.43.48:/rhs/thinbrick1/ozone/hdd
Brick3: 10.70.43.48:/rhs/thinbrick1/ozone/dd
Brick4: 10.70.43.48:/rhs/thinbrick2/ozone/dd
Brick5: 10.70.42.147:/rhs/thinbrick1/ozone/dd
Brick6: 10.70.42.147:/rhs/thinbrick2/ozone/dd
Options Reconfigured:
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
storage.build-pgfid: on

Volume Name: pluto
Type: Distribute
Volume ID: 5656ab65-c1da-44a8-9ff0-46d08c9a8c61
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1:
10.70.43.48:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick1/pluto/dd
Brick2:
10.70.43.48:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick2/pluto/dd
Brick3:
10.70.42.147:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick3/pluto/dd
Brick4:
10.70.42.147:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick4/pluto/dd
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# cd /rhs/
brick1/     brick2/     ozone/      thinbrick1/ thinbrick2/ 
[root at dhcp43-48 ~]#
[root at dhcp43-48 ~]# gluster snapshot list
snap_5
[root at dhcp43-48 ~]#
[root at dhcp43-48 ~]# gluster snapshot delete pluto
Deleting snap will erase all the information about the snap. Do you still want
to continue? (y/n) y
snapshot delete: failed: Snapshot (pluto) does not exist
Snapshot command failed
[root at dhcp43-48 ~]# gluster snapshot delete volume  pluto
Volume (pluto) contains 1 snapshot(s).
Do you still want to continue and delete them?  (y/n) y
snapshot delete: snap_5: snap removed successfully
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v stop pluto
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: pluto: failed: Volume pluto is not in the started state
[root at dhcp43-48 ~]# gluster v delete pluto
Deleting volume will erase all information about the volume. Do you want to
continue? (y/n) y
volume delete: pluto: failed: Staging failed on 10.70.42.147. Error: Volume
pluto has been started.Volume needs to be stopped before deletion.
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v i

Volume Name: nash
Type: Distributed-Replicate
Volume ID: cd66179e-6fda-49cf-b40f-be930bc01f6f
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.43.48:/rhs/brick1/dd
Brick2: 10.70.42.147:/rhs/brick1/dd
Brick3: 10.70.43.48:/rhs/brick2/dd
Brick4: 10.70.42.147:/rhs/brick2/dd
Options Reconfigured:
changelog.changelog: on
storage.build-pgfid: on

Volume Name: ozone
Type: Tier
Volume ID: 4611c8ba-4f32-409c-8858-81d55d2acc75
Status: Started
Number of Bricks: 6 x 1 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.42.147:/rhs/thinbrick1/ozone/hdd
Brick2: 10.70.43.48:/rhs/thinbrick1/ozone/hdd
Brick3: 10.70.43.48:/rhs/thinbrick1/ozone/dd
Brick4: 10.70.43.48:/rhs/thinbrick2/ozone/dd
Brick5: 10.70.42.147:/rhs/thinbrick1/ozone/dd
Brick6: 10.70.42.147:/rhs/thinbrick2/ozone/dd
Options Reconfigured:
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
storage.build-pgfid: on

Volume Name: pluto
Type: Distribute
Volume ID: 5656ab65-c1da-44a8-9ff0-46d08c9a8c61
Status: Stopped
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1:
10.70.43.48:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick1/pluto/dd
Brick2:
10.70.43.48:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick2/pluto/dd
Brick3:
10.70.42.147:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick3/pluto/dd
Brick4:
10.70.42.147:/var/run/gluster/snaps/9a209867fb0b4b3f86f49494a6cfc191/brick4/pluto/dd
[root at dhcp43-48 ~]#
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v stop pluto
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: pluto: failed: Volume pluto is not in the started state
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v start pluto
volume start: pluto: failed: Staging failed on 10.70.42.147. Error: Volume
pluto already started
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v stop pluto
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: pluto: failed: Volume pluto is not in the started state
[root at dhcp43-48 ~]# 
[root at dhcp43-48 ~]# gluster v status pluto
Volume pluto is not started
[root at dhcp43-48 ~]#

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.