[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume

Tiemen Ruiten t.ruiten at rdmedia.com
Tue Sep 22 09:05:37 UTC 2015


I missed having passwordless SSH auth for the root user. However it did not
make a difference:

After verifying prerequisites, issued gluster nfs-ganesha enable on node
cobalt:

Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...
Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.
Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name
Lookups.
Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...
Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.
Sep 22 10:19:56 cobalt systemd: Started RPC bind service.
Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3
locking....
Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting
Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC
Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3
locking..
Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...
Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.
Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
capabilities (legacy support in use)
Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from
cobalt.int.rdmedia.com while not monitoring any hosts
Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha
Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the
following cobalt iron
Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster
Manager.
Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync Cluster Engine
('2.3.4'): started and ready to provide service.
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync built-in features:
dbus systemd xmlconf snmp pie relro bindnow
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface
[10.100.30.37] is now up.
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration map access [0]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cmap
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration service [1]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cfg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cpg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync profile loading service [4]
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider
corosync_votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: quorum
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.37}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.38}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:140) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:148) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:
Sep 22 10:19:58 cobalt corosync[2816]: [MAIN  ] Completed service
synchronization, ready to provide service.
*Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out.
Terminating.*
*Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine
(corosync):*
*Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.*
*Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed
state.*
Sep 22 10:21:32 cobalt logger: warning: pcs property set
no-quorum-policy=ignore failed
Sep 22 10:21:32 cobalt logger: warning: pcs property set
stonith-enabled=false failed
Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone
failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon
ganesha_mon --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace
ganesha_grace --clone failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32
op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
cobalt-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
nfs-grace-clone then cobalt-cluster_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op
monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
iron-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
iron-cluster_ip-1 with iron-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
iron-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint order
nfs-grace-clone then iron-cluster_ip-1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers iron=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers cobalt=2000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers cobalt=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers iron=2000 failed
Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push
/tmp/tmp.yqLT4m75WG failed

Notice the failed corosync service in bold. I can't find any logs pointing
to a reason. Starting it manually is not a problem:

Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine
(corosync): [  OK  ]

Then I noticed pacemaker was not running on both nodes. Started it manually
and saw the following in /var/log/messages on the other node:

Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin
--replace -o configuration -V --xml-pipe
Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore
Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since
no STONITH resources have been defined
Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable
STONITH with the stonith-enabled option
Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data
need STONITH to ensure data integrity
Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations
until there are resources to manage
Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!
Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes
until quorum is attained (or no-quorum-policy is set to ignore)
Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-warn-20.bz2
Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found
during PE processing.  Please run "crm_verify -L" to identify issues.
Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete
Sep 22 10:36:44 iron crmd[4617]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

I'm starting to think there is some leftover config somewhere from all
these attempts. Is there a way to completely reset all config related to
NFS-Ganesha and start over?



On 22 September 2015 at 09:04, Soumya Koduri <skoduri at redhat.com> wrote:

> Hi Tiemen,
>
> Have added the steps to configure HA NFS in the below doc. Please verify
> if you have all the pre-requisites done & steps performed right.
>
>
> https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md
>
> Thanks,
> Soumya
>
> On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:
>
>> Whoops, replied off-list.
>>
>> Additionally I noticed that the generated corosync config is not valid,
>> as there is no interface section:
>>
>> /etc/corosync/corosync.conf
>>
>> totem {
>> version: 2
>> secauth: off
>> cluster_name: rd-ganesha-ha
>> transport: udpu
>> }
>>
>> nodelist {
>> Â  node {
>> Â  Â  Â  Â  ring0_addr: cobalt
>> Â  Â  Â  Â  nodeid: 1
>> Â  Â  Â  Â }
>> Â  node {
>> Â  Â  Â  Â  ring0_addr: iron
>> Â  Â  Â  Â  nodeid: 2
>> Â  Â  Â  Â }
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> logging {
>> to_syslog: yes
>> }
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at rdmedia.com
>> >>
>> Date: 21 September 2015 at 17:16
>> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume
>> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at redhat.com
>> >>
>>
>>
>> Could you point me to the latest documentation? I've been struggling to
>> find something up-to-date. I believe I have all the prerequisites:
>>
>> - shared storage volume exists and is mounted
>> - all nodes in hosts files
>> - Gluster-NFS disabled
>> - corosync, pacemaker and nfs-ganesha rpm's installed
>>
>> Anything I missed?
>>
>> Everything has been installed by RPM so is in the default locations:
>> /usr/libexec/ganesha/ganesha-ha.sh
>> /etc/ganesha/ganesha.conf (empty)
>> /etc/ganesha/ganesha-ha.conf
>>
>> After I started the pcsd service manually, nfs-ganesha could be enabled
>> successfully, but there was no virtual IP present on the interfaces and
>> looking at the system log, I noticed corosync failed to start:
>>
>> - on the host where I issued the gluster nfs-ganesha enable command:
>>
>> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from
>> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not monitoring
>> any hosts
>> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN Â ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
>> [10.100.30.38] is now up.
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cmap
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cfg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: cpg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [SERV Â ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:20 iron corosync[3427]: [QB Â  Â ] server name: quorum
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members joined:
>> 1
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:20 iron corosync[3427]: [MAIN Â ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined:
>> 1
>>
>> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
>> (corosync): [FAILED]
>> Sep 21 17:08:21 iron systemd: corosync.service: control process exited,
>> code=exited status=1
>> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state.
>>
>>
>> - on the other host:
>>
>> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration...
>> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.
>> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
>> Lookups.
>> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
>> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.
>> Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
>> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3
>> locking....
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
>> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3
>> locking..
>> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
>> capabilities (legacy support in use)
>> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request
>> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not
>> monitoring any hosts
>> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the
>> following cobalt iron
>> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
>> Cluster Manager.
>> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN Â ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface
>> [10.100.30.37] is now up.
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cmap
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cfg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: cpg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV Â ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB Â  Â ] server name: quorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN Â ] Completed service
>>
>> synchronization, ready to provide service.
>> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.
>> Terminating.
>> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
>> (corosync):
>> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed
>> state.
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> no-quorum-policy=ignore failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> stonith-enabled=false failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start
>> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
>> nfs_start-clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
>> ganesha_mon --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace
>> ganesha_grace --clone failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> cobalt-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then cobalt-cluster_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> iron-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> iron-cluster_ip-1 with iron-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> iron-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then iron-cluster_ip-1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers iron=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers cobalt=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers cobalt=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers iron=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
>> /tmp/tmp.nXTfyA1GMR failed
>> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt
>> failed
>>
>> BTW, I'm using CentOS 7. There are multiple network interfaces on the
>> servers, could that be a problem?Â
>>
>>
>>
>>
>> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at redhat.com
>> <mailto:jthottan at redhat.com>> wrote:
>>
>>
>>
>>     On 21/09/15 13:56, Tiemen Ruiten wrote:
>>
>>>     Hello Soumya, Kaleb, list,
>>>
>>>     This Friday I created the gluster_shared_storage volume manually,
>>>     I just tried it with the command you supplied, but both have the
>>>     same result:
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the node where I issued the
>>>     command:
>>>
>>>     [2015-09-21 07:59:47.756845] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.071755] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.653879] E [MSGID: 106470]
>>>     [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:
>>>     Initial NFS-Ganesha set up failed
>>>
>>
>>     As far as what I understand from the logs, it called
>>     setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.
>>     Can u please provide following details :
>>     -Location of ganesha.sh file??
>>     -Location of ganesha-ha.conf, ganesha.conf files ?
>>
>>
>>     And also can u cross check whether all the prerequisites before HA
>>     setup satisfied ?
>>
>>     --
>>     With Regards,
>>     Jiffin
>>
>>
>>     [2015-09-21 07:59:48.653912] E [MSGID: 106123]
>>>     [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit
>>>     of operation 'Volume (null)' failed on localhost : Failed to set
>>>     up HA config for NFS-Ganesha. Please check the log file for details
>>>     [2015-09-21 07:59:45.402458] I [MSGID: 106006]
>>>     [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
>>>     0-management: nfs has disconnected from glusterd.
>>>     [2015-09-21 07:59:48.071578] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the other node:
>>>
>>>     [2015-09-21 08:12:50.111877] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.548087] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.654746] I [MSGID: 106132]
>>>     [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>     already stopped
>>>     [2015-09-21 08:14:50.655095] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 08:14:51.287156] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>>     to acquire volname
>>>
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the arbiter node:
>>>
>>>     [2015-09-21 08:18:50.934713] E [MSGID: 101075]
>>>     [common-utils.c:3127:gf_is_local_addr] 0-management: error in
>>>     getaddrinfo: Name or service not known
>>>     [2015-09-21 08:18:51.504694] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>>     to acquire volname
>>>
>>>     I have put the hostnames of all servers in my /etc/hosts file,
>>>     including the arbiter node.
>>>
>>>
>>>     On 18 September 2015 at 16:52, Soumya Koduri <skoduri at redhat.com
>>>     <mailto:skoduri at redhat.com>> wrote:
>>>
>>>         Hi Tiemen,
>>>
>>>         One of the pre-requisites before setting up nfs-ganesha HA is
>>>         to create and mount shared_storage volume. Use below CLI for that
>>>
>>>         "gluster volume set all cluster.enable-shared-storage enable"
>>>
>>>         It shall create the volume and mount in all the nodes
>>>         (including the arbiter node). Note this volume shall be
>>>         mounted on all the nodes of the gluster storage pool (though
>>>         in this case it may not be part of nfs-ganesha cluster).
>>>
>>>         So instead of manually creating those directory paths, please
>>>         use above CLI and try re-configuring the setup.
>>>
>>>         Thanks,
>>>         Soumya
>>>
>>>         On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:
>>>
>>>             Hello Kaleb,
>>>
>>>             I don't:
>>>
>>>             # Name of the HA cluster created.
>>>             # must be unique within the subnet
>>>             HA_NAME="rd-ganesha-ha"
>>>             #
>>>             # The gluster server from which to mount the shared data
>>>             volume.
>>>             HA_VOL_SERVER="iron"
>>>             #
>>>             # N.B. you may use short names or long names; you may not
>>>             use IP addrs.
>>>             # Once you select one, stay with it as it will be mildly
>>>             unpleasant to
>>>             # clean up if you switch later on. Ensure that all names -
>>>             short and/or
>>>             # long - are in DNS or /etc/hosts on all machines in the
>>>             cluster.
>>>             #
>>>             # The subset of nodes of the Gluster Trusted Pool that
>>>             form the ganesha
>>>             # HA cluster. Hostname is specified.
>>>             HA_CLUSTER_NODES="cobalt,iron"
>>>             #HA_CLUSTER_NODES="server1.lab.redhat.com
>>>             <http://server1.lab.redhat.com>
>>>             <http://server1.lab.redhat.com>,server2.lab.redhat.com
>>>             <http://server2.lab.redhat.com>
>>>             <http://server2.lab.redhat.com>,..."
>>>             #
>>>             # Virtual IPs for each of the nodes specified above.
>>>             VIP_server1="10.100.30.101"
>>>             VIP_server2="10.100.30.102"
>>>             #VIP_server1_lab_redhat_com="10.0.2.1"
>>>             #VIP_server2_lab_redhat_com="10.0.2.2"
>>>
>>>             hosts cobalt & iron are the data nodes, the arbiter
>>>             ip/hostname (neon)
>>>             isn't mentioned anywhere in this config file.
>>>
>>>
>>>             On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
>>>             <<mailto:kkeithle at redhat.com>kkeithle at redhat.com
>>>             <mailto:kkeithle at redhat.com>
>>>             <mailto:kkeithle at redhat.com <mailto:kkeithle at redhat.com>>>
>>>             wrote:
>>>
>>>             Â  Â  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
>>>             Â  Â  > Hello,
>>>             Â  Â  >
>>>             Â  Â  > I have a Gluster cluster with a single replica 3,
>>>             arbiter 1 volume (so
>>>             Â  Â  > two nodes with actual data, one arbiter node). I
>>>             would like to setup
>>>             Â  Â  > NFS-Ganesha HA for this volume but I'm having some
>>>             difficulties.
>>>             Â  Â  >
>>>             Â  Â  > - I needed to create a directory
>>>             /var/run/gluster/shared_storage
>>>             Â  Â  > manually on all nodes, or the command 'gluster
>>>             nfs-ganesha enable would
>>>             Â  Â  > fail with the following error:
>>>             Â  Â  > [2015-09-18 13:13:34.690416] E [MSGID: 106032]
>>>             Â  Â  > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name:
>>>             mkdir() failed on path
>>>             Â  Â  > /var/run/gluster/shared_storage/nfs-ganesha, [No
>>>             such file or directory]
>>>             Â  Â  >
>>>             Â  Â  > - Then I found out that the command connects to
>>>             the arbiter node as
>>>             Â  Â  > well, but obviously I don't want to set up
>>>             NFS-Ganesha there. Is it
>>>             Â  Â  > actually possible to setup NFS-Ganesha HA with an
>>>             arbiter node? If it's
>>>             Â  Â  > possible, is there any documentation on how to do
>>>             that?
>>>             Â  Â  >
>>>
>>>             Â  Â  Please send the /etc/ganesha/ganesha-ha.conf file
>>>             you're using.
>>>
>>>             Â  Â  Probably you have included the arbiter in your HA
>>>             config; that would be
>>>             Â  Â  a mistake.
>>>
>>>             Â  Â  --
>>>
>>>             Â  Â  Kaleb
>>>
>>>
>>>
>>>
>>>             --
>>>             Tiemen Ruiten
>>>             Systems Engineer
>>>             R&D Media
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>     --
>>>     Tiemen Ruiten
>>>     Systems Engineer
>>>     R&D Media
>>>
>>>
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>


-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150922/353c786a/attachment.html>


More information about the Gluster-users mailing list