[Bugs] [Bug 1636088] New: ocf:glusterfs: volume resource agent for pacemaker fails to stop gluster volume processes glusterfsd
bugzilla at redhat.com
bugzilla at redhat.com
Thu Oct 4 12:45:40 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1636088
Bug ID: 1636088
Summary: ocf:glusterfs:volume resource agent for pacemaker
fails to stop gluster volume processes glusterfsd
Product: GlusterFS
Version: 3.12
Component: common-ha
Assignee: bugs at gluster.org
Reporter: erik.dobak at gmail.com
CC: bugs at gluster.org
Description of problem:
I am using pacemaker to run glusterfs. After setting it up i tested it with
'crm node standby node01' but got a 'time out' from the volume agent:
crmd: error: process_lrm_event: Result of stop operation for
p_volume_gluster on node02: Timed Out | call=559 key=p_volume_gluster_stop_0
timeout=20000ms
when checking the processes with ps -ef i still could see gluster processes
running on the node.
Version-Release number of selected component (if applicable):
Name : glusterfs-resource-agents
Arch : noarch
Version : 3.12.14
Release : 1.el6
Size : 13 k
Repo : installed
>From repo : centos-gluster312
How reproducible:
configure gluster in pacemaker (2 nodes):
primitive glusterd ocf:glusterfs:glusterd \
op monitor interval=10 timeout=120s \
op start timeout=120s interval=0 \
op stop timeout=120s interval=0
primitive p_volume_gluster ocf:glusterfs:volume \
params volname=gv0 \
op stop interval=0 trace_ra=1 \
op monitor interval=0 timeout=120s \
op start timeout=120s interval=0
clone cl_glusterd glusterd \
meta interleave=true clone-max=2 clone-node-max=1 target-role=Started
clone cl_glustervol p_volume_gluster \
meta interleave=true clone-max=2 clone-node-max=1
run the gluster in the cluster then put a node on standby.
Steps to Reproduce:
1. start gluster in pacemaker
2. put a node on standby: crm node standby node01
3. wait for the error messages
Actual results:
getting a time out error for the volume primitive. the processes are still
running: /usr/sbin/glusterfsd
Expected results:
gluster should shutdown and no error should be in corosync.log
Additional info:
i did do debuging of the volume resource agent
(/usr/lib/ocf/resource.d/glusterfs/volume) and could find 2 issues that
prevented the agent to stop the processes.
1. SHORTHOSTNAME=`hostname -s`
In my system only the full hostname was used. i had to change this line to:
SHORTHOSTNAME=`hostname`
2. function volume_getdir() had wrong path hardcoded
volume_getdir() {
local voldir
voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"
[ -d ${voldir} ] || return 1
echo "${voldir}"
return 0
}
i had to change /etc/glusterd into /var/lib/glusterd:
volume_getdir() {
local voldir
voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"
[ -d ${voldir} ] || return 1
echo "${voldir}"
return 0
}
i am not sure if this is because of i am running centos 6. maybe the paths and
hostnames differ on centos 7..
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list