[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume
Tiemen Ruiten
t.ruiten at rdmedia.com
Tue Sep 22 09:05:37 UTC 2015
I missed having passwordless SSH auth for the root user. However it did not
make a difference:
After verifying prerequisites, issued gluster nfs-ganesha enable on node
cobalt:
Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...
Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.
Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name
Lookups.
Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...
Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.
Sep 22 10:19:56 cobalt systemd: Started RPC bind service.
Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3
locking....
Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting
Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC
Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3
locking..
Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...
Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.
Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
capabilities (legacy support in use)
Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from
cobalt.int.rdmedia.com while not monitoring any hosts
Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha
Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the
following cobalt iron
Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster
Manager.
Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync Cluster Engine
('2.3.4'): started and ready to provide service.
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync built-in features:
dbus systemd xmlconf snmp pie relro bindnow
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface
[10.100.30.37] is now up.
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync configuration map access [0]
Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cmap
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync configuration service [1]
Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cfg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cpg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync profile loading service [4]
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider
corosync_votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: quorum
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.37}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.38}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:140) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:148) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:
Sep 22 10:19:58 cobalt corosync[2816]: [MAIN ] Completed service
synchronization, ready to provide service.
*Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out.
Terminating.*
*Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine
(corosync):*
*Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.*
*Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed
state.*
Sep 22 10:21:32 cobalt logger: warning: pcs property set
no-quorum-policy=ignore failed
Sep 22 10:21:32 cobalt logger: warning: pcs property set
stonith-enabled=false failed
Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone
failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon
ganesha_mon --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace
ganesha_grace --clone failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32
op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
cobalt-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
nfs-grace-clone then cobalt-cluster_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op
monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
iron-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
iron-cluster_ip-1 with iron-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
iron-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint order
nfs-grace-clone then iron-cluster_ip-1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers iron=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers cobalt=2000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers cobalt=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers iron=2000 failed
Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push
/tmp/tmp.yqLT4m75WG failed
Notice the failed corosync service in bold. I can't find any logs pointing
to a reason. Starting it manually is not a problem:
Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine
(corosync): [ OK ]
Then I noticed pacemaker was not running on both nodes. Started it manually
and saw the following in /var/log/messages on the other node:
Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin
--replace -o configuration -V --xml-pipe
Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore
Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since
no STONITH resources have been defined
Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable
STONITH with the stonith-enabled option
Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data
need STONITH to ensure data integrity
Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations
until there are resources to manage
Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!
Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes
until quorum is attained (or no-quorum-policy is set to ignore)
Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-warn-20.bz2
Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found
during PE processing. Please run "crm_verify -L" to identify issues.
Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete
Sep 22 10:36:44 iron crmd[4617]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]
I'm starting to think there is some leftover config somewhere from all
these attempts. Is there a way to completely reset all config related to
NFS-Ganesha and start over?
On 22 September 2015 at 09:04, Soumya Koduri <skoduri at redhat.com> wrote:
> Hi Tiemen,
>
> Have added the steps to configure HA NFS in the below doc. Please verify
> if you have all the pre-requisites done & steps performed right.
>
>
> https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md
>
> Thanks,
> Soumya
>
> On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:
>
>> Whoops, replied off-list.
>>
>> Additionally I noticed that the generated corosync config is not valid,
>> as there is no interface section:
>>
>> /etc/corosync/corosync.conf
>>
>> totem {
>> version: 2
>> secauth: off
>> cluster_name: rd-ganesha-ha
>> transport: udpu
>> }
>>
>> nodelist {
>> Â node {
>> Â Â Â Â ring0_addr: cobalt
>> Â Â Â Â nodeid: 1
>> Â Â Â Â }
>> Â node {
>> Â Â Â Â ring0_addr: iron
>> Â Â Â Â nodeid: 2
>> Â Â Â Â }
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> logging {
>> to_syslog: yes
>> }
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at rdmedia.com
>> >>
>> Date: 21 September 2015 at 17:16
>> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume
>> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at redhat.com
>> >>
>>
>>
>> Could you point me to the latest documentation? I've been struggling to
>> find something up-to-date. I believe I have all the prerequisites:
>>
>> - shared storage volume exists and is mounted
>> - all nodes in hosts files
>> - Gluster-NFS disabled
>> - corosync, pacemaker and nfs-ganesha rpm's installed
>>
>> Anything I missed?
>>
>> Everything has been installed by RPM so is in the default locations:
>> /usr/libexec/ganesha/ganesha-ha.sh
>> /etc/ganesha/ganesha.conf (empty)
>> /etc/ganesha/ganesha-ha.conf
>>
>> After I started the pcsd service manually, nfs-ganesha could be enabled
>> successfully, but there was no virtual IP present on the interfaces and
>> looking at the system log, I noticed corosync failed to start:
>>
>> - on the host where I issued the gluster nfs-ganesha enable command:
>>
>> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from
>> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not monitoring
>> any hosts
>> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
>> [10.100.30.38] is now up.
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cmap
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cfg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: cpg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â Â ] server name: quorum
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members joined:
>> 1
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:20 iron corosync[3427]: [MAIN Â ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined:
>> 1
>>
>> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
>> (corosync): [FAILED]
>> Sep 21 17:08:21 iron systemd: corosync.service: control process exited,
>> code=exited status=1
>> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state.
>>
>>
>> - on the other host:
>>
>> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration...
>> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.
>> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
>> Lookups.
>> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
>> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.
>> Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
>> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3
>> locking....
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
>> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3
>> locking..
>> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
>> capabilities (legacy support in use)
>> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request
>> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not
>> monitoring any hosts
>> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the
>> following cobalt iron
>> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
>> Cluster Manager.
>> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface
>> [10.100.30.37] is now up.
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cmap
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cfg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: cpg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â Â ] server name: quorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service
>>
>> synchronization, ready to provide service.
>> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.
>> Terminating.
>> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
>> (corosync):
>> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed
>> state.
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> no-quorum-policy=ignore failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> stonith-enabled=false failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start
>> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
>> nfs_start-clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
>> ganesha_mon --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace
>> ganesha_grace --clone failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> cobalt-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then cobalt-cluster_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> iron-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> iron-cluster_ip-1 with iron-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> iron-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then iron-cluster_ip-1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers iron=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers cobalt=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers cobalt=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers iron=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
>> /tmp/tmp.nXTfyA1GMR failed
>> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt
>> failed
>>
>> BTW, I'm using CentOS 7. There are multiple network interfaces on the
>> servers, could that be a problem?Â
>>
>>
>>
>>
>> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at redhat.com
>> <mailto:jthottan at redhat.com>> wrote:
>>
>>
>>
>> On 21/09/15 13:56, Tiemen Ruiten wrote:
>>
>>> Hello Soumya, Kaleb, list,
>>>
>>> This Friday I created the gluster_shared_storage volume manually,
>>> I just tried it with the command you supplied, but both have the
>>> same result:
>>>
>>> from etc-glusterfs-glusterd.vol.log on the node where I issued the
>>> command:
>>>
>>> [2015-09-21 07:59:47.756845] I [MSGID: 106474]
>>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>> host found Hostname is cobalt
>>> [2015-09-21 07:59:48.071755] I [MSGID: 106474]
>>> [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha
>>> host found Hostname is cobalt
>>> [2015-09-21 07:59:48.653879] E [MSGID: 106470]
>>> [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:
>>> Initial NFS-Ganesha set up failed
>>>
>>
>> As far as what I understand from the logs, it called
>> setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.
>> Can u please provide following details :
>> -Location of ganesha.sh file??
>> -Location of ganesha-ha.conf, ganesha.conf files ?
>>
>>
>> And also can u cross check whether all the prerequisites before HA
>> setup satisfied ?
>>
>> --
>> With Regards,
>> Jiffin
>>
>>
>> [2015-09-21 07:59:48.653912] E [MSGID: 106123]
>>> [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit
>>> of operation 'Volume (null)' failed on localhost : Failed to set
>>> up HA config for NFS-Ganesha. Please check the log file for details
>>> [2015-09-21 07:59:45.402458] I [MSGID: 106006]
>>> [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
>>> 0-management: nfs has disconnected from glusterd.
>>> [2015-09-21 07:59:48.071578] I [MSGID: 106474]
>>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>> host found Hostname is cobalt
>>>
>>> from etc-glusterfs-glusterd.vol.log on the other node:
>>>
>>> [2015-09-21 08:12:50.111877] E [MSGID: 106062]
>>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>> to acquire volname
>>> [2015-09-21 08:14:50.548087] E [MSGID: 106062]
>>> [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable
>>> to acquire volname
>>> [2015-09-21 08:14:50.654746] I [MSGID: 106132]
>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>> already stopped
>>> [2015-09-21 08:14:50.655095] I [MSGID: 106474]
>>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>> host found Hostname is cobalt
>>> [2015-09-21 08:14:51.287156] E [MSGID: 106062]
>>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>> to acquire volname
>>>
>>>
>>> from etc-glusterfs-glusterd.vol.log on the arbiter node:
>>>
>>> [2015-09-21 08:18:50.934713] E [MSGID: 101075]
>>> [common-utils.c:3127:gf_is_local_addr] 0-management: error in
>>> getaddrinfo: Name or service not known
>>> [2015-09-21 08:18:51.504694] E [MSGID: 106062]
>>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>> to acquire volname
>>>
>>> I have put the hostnames of all servers in my /etc/hosts file,
>>> including the arbiter node.
>>>
>>>
>>> On 18 September 2015 at 16:52, Soumya Koduri <skoduri at redhat.com
>>> <mailto:skoduri at redhat.com>> wrote:
>>>
>>> Hi Tiemen,
>>>
>>> One of the pre-requisites before setting up nfs-ganesha HA is
>>> to create and mount shared_storage volume. Use below CLI for that
>>>
>>> "gluster volume set all cluster.enable-shared-storage enable"
>>>
>>> It shall create the volume and mount in all the nodes
>>> (including the arbiter node). Note this volume shall be
>>> mounted on all the nodes of the gluster storage pool (though
>>> in this case it may not be part of nfs-ganesha cluster).
>>>
>>> So instead of manually creating those directory paths, please
>>> use above CLI and try re-configuring the setup.
>>>
>>> Thanks,
>>> Soumya
>>>
>>> On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:
>>>
>>> Hello Kaleb,
>>>
>>> I don't:
>>>
>>> # Name of the HA cluster created.
>>> # must be unique within the subnet
>>> HA_NAME="rd-ganesha-ha"
>>> #
>>> # The gluster server from which to mount the shared data
>>> volume.
>>> HA_VOL_SERVER="iron"
>>> #
>>> # N.B. you may use short names or long names; you may not
>>> use IP addrs.
>>> # Once you select one, stay with it as it will be mildly
>>> unpleasant to
>>> # clean up if you switch later on. Ensure that all names -
>>> short and/or
>>> # long - are in DNS or /etc/hosts on all machines in the
>>> cluster.
>>> #
>>> # The subset of nodes of the Gluster Trusted Pool that
>>> form the ganesha
>>> # HA cluster. Hostname is specified.
>>> HA_CLUSTER_NODES="cobalt,iron"
>>> #HA_CLUSTER_NODES="server1.lab.redhat.com
>>> <http://server1.lab.redhat.com>
>>> <http://server1.lab.redhat.com>,server2.lab.redhat.com
>>> <http://server2.lab.redhat.com>
>>> <http://server2.lab.redhat.com>,..."
>>> #
>>> # Virtual IPs for each of the nodes specified above.
>>> VIP_server1="10.100.30.101"
>>> VIP_server2="10.100.30.102"
>>> #VIP_server1_lab_redhat_com="10.0.2.1"
>>> #VIP_server2_lab_redhat_com="10.0.2.2"
>>>
>>> hosts cobalt & iron are the data nodes, the arbiter
>>> ip/hostname (neon)
>>> isn't mentioned anywhere in this config file.
>>>
>>>
>>> On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
>>> <<mailto:kkeithle at redhat.com>kkeithle at redhat.com
>>> <mailto:kkeithle at redhat.com>
>>> <mailto:kkeithle at redhat.com <mailto:kkeithle at redhat.com>>>
>>> wrote:
>>>
>>> Â Â On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
>>> Â Â > Hello,
>>> Â Â >
>>> Â Â > I have a Gluster cluster with a single replica 3,
>>> arbiter 1 volume (so
>>> Â Â > two nodes with actual data, one arbiter node). I
>>> would like to setup
>>> Â Â > NFS-Ganesha HA for this volume but I'm having some
>>> difficulties.
>>> Â Â >
>>> Â Â > - I needed to create a directory
>>> /var/run/gluster/shared_storage
>>> Â Â > manually on all nodes, or the command 'gluster
>>> nfs-ganesha enable would
>>> Â Â > fail with the following error:
>>> Â Â > [2015-09-18 13:13:34.690416] E [MSGID: 106032]
>>> Â Â > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name:
>>> mkdir() failed on path
>>> Â Â > /var/run/gluster/shared_storage/nfs-ganesha, [No
>>> such file or directory]
>>> Â Â >
>>> Â Â > - Then I found out that the command connects to
>>> the arbiter node as
>>> Â Â > well, but obviously I don't want to set up
>>> NFS-Ganesha there. Is it
>>> Â Â > actually possible to setup NFS-Ganesha HA with an
>>> arbiter node? If it's
>>> Â Â > possible, is there any documentation on how to do
>>> that?
>>> Â Â >
>>>
>>> Â Â Please send the /etc/ganesha/ganesha-ha.conf file
>>> you're using.
>>>
>>> Â Â Probably you have included the arbiter in your HA
>>> config; that would be
>>> Â Â a mistake.
>>>
>>> Â Â --
>>>
>>> Â Â Kaleb
>>>
>>>
>>>
>>>
>>> --
>>> Tiemen Ruiten
>>> Systems Engineer
>>> R&D Media
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> --
>>> Tiemen Ruiten
>>> Systems Engineer
>>> R&D Media
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
--
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150922/353c786a/attachment.html>
More information about the Gluster-users
mailing list