[Gluster-users] 3 node NFS-Ganesha Cluster

ml ml at nocloud.ch
Fri Nov 27 12:43:59 UTC 2015


Dear Soumya,

First of all thank you for your answer.

On Fre, 2015-11-27 at 14:27 +0530, Soumya Koduri wrote:
> Hi,
> 
> On 11/27/2015 01:58 PM, ml wrote:
> > Dear All,
> > 
> > I am trying to get a nfs-ganesha ha cluster running, with 3, CentOS
> > Linux release 7.1.1503 nodes. I use the package glusterfs-ganesha
> > -3.7.6
> > -1.el7.x86_64 to get the HA scripts. So far it works fine when i
> > stop
> > the nfs-ganesha service on one of the node it moves the virtual ip
> > to
> > one of the other node, altai-dead_ip-1 resource is created
> > properly:
> > 
> > 
> > 	root at rnas2 ~# pcs status
> > 	Cluster name: ganesha-cluster-dmath
> > 	Last updated: Thu Nov 26 10:41:07 2015		Last
> > change: Thu Nov 26 10:40:06 2015 by root via cibadmin on altai
> > 	Stack: corosync
> > 	Current DC: rnas2 (version 1.1.13-a14efad) - partition with
> > quorum
> > 	3 nodes and 13 resources configured
> > 
> > 	Online: [ altai kaukasus rnas2 ]
> > 
> > 	Full list of resources:
> > 
> > 	 Clone Set: nfs-mon-clone [nfs-mon]
> > 	     Started: [ altai kaukasus rnas2 ]
> > 	 Clone Set: nfs-grace-clone [nfs-grace]
> > 	     Started: [ altai kaukasus rnas2 ]
> > 	 kaukasus-cluster_ip-1	(ocf::heartbeat:IPaddr):	S
> > tarted kaukasus
> > 	 kaukasus-trigger_ip-1	(ocf::heartbeat:Dummy):	St
> > arted kaukasus
> > 	 altai-cluster_ip-1	(ocf::heartbeat:IPaddr):	Star
> > ted kaukasus
> > 	 altai-trigger_ip-1	(ocf::heartbeat:Dummy):	Start
> > ed kaukasus
> > 	 rnas2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Star
> > ted rnas2
> > 	 rnas2-trigger_ip-1	(ocf::heartbeat:Dummy):	Start
> > ed rnas2
> > 	 altai-dead_ip-1	(ocf::heartbeat:Dummy):	Started
> > altai
> > 
> > 	PCSD Status:
> > 	  kaukasus: Online
> > 	  altai: Online
> > 	  rnas2: Online
> > 
> > 	Daemon Status:
> > 	  corosync: active/enabled
> > 	  pacemaker: active/enabled
> > 	  pcsd: active/enabled
> > 
> > 
> > But when i just disconnect the network on one of the node, in this
> > case
> > altai (or poweroff),
> > 
> > 
> > 	root at altai ~# ifdown bond0
> > 
> > 
> > it takes down the whole cluster. I found the following message in
> > the
> > logs:
> > 
> > 
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs
> > -grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms)
> > 
> > 
> > I wonder if i just misconfigured something or if this is not
> > supported
> > yet?
> > 
> 
> Since its a 3-node cluster, quorum shall be enabled. When any of
> those 
> machine/its IP is down, quorum shall be lost resulting in pacemaker 
> shutting down entire cluster. If possible could you check the same 
> scenario with 4-node setup?


Sorry in advance, i might not correctly understand as i am not nativ
english speaking. But are you telling me that in a 3-node cluster,
quorum is lost when one of the nodes ip is down?

However i am setting up a additional node to test a 4-node setup, but
even then if i put down one node and nfs-grace_start
(/usr/lib/ocf/resource.d/heartbeat/ganesha_grace) did not run properly
on the other nodes, could it be that the whole cluster goes down as
quorum lost again?

Yours,
Rigi


> 
> Thanks,
> Soumya
> > 
> > below the log during the take down:
> > 
> > 	Nov 26 10:44:24 rnas2 corosync[8848]: [TOTEM ] A new membership
> > (129.132.145.5:1048) was formed. Members left: 2
> > 	Nov 26 10:44:24 rnas2 attrd[17253]: notice:
> > crm_update_peer_proc: Node altai[2] - state is now lost (was
> > member)
> > 	Nov 26 10:44:24 rnas2 attrd[17253]: notice: Removing all altai
> > attributes for attrd_peer_change_cb
> > 	Nov 26 10:44:25 rnas2 corosync[8848]: [QUORUM] Members[2]: 1 3
> > 	Nov 26 10:44:25 rnas2 corosync[8848]: [MAIN  ] Completed
> > service synchronization, ready to provide service.
> > 	Nov 26 10:44:25 rnas2 cib[17250]: notice: crm_update_peer_proc:
> > Node altai[2] - state is now lost (was member)
> > 	Nov 26 10:44:25 rnas2 cib[17250]: notice: Removing altai/2 from
> > the membership list
> > 	Nov 26 10:44:25 rnas2 cib[17250]: notice: Purged 1 peers with
> > id=2 and/or uname=altai from the membership cache
> > 	Nov 26 10:44:25 rnas2 pacemakerd[17249]: notice: Node altai[2]
> > - state is now lost (was member)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Node altai[2] -
> > state is now lost (was member)
> > 	Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice:
> > crm_update_peer_proc: Node altai[2] - state is now lost (was
> > member)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for
> > shutdown action on 2
> > 	Nov 26 10:44:25 rnas2 attrd[17253]: notice: Removing altai/2
> > from the membership list
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of
> > altai not matched
> > 	Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Removing
> > altai/2 from the membership list
> > 	Nov 26 10:44:25 rnas2 attrd[17253]: notice: Purged 1 peers with
> > id=2 and/or uname=altai from the membership cache
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: State transition
> > S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> > origin=abort_transition_graph ]
> > 	Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Purged 1 peers
> > with id=2 and/or uname=altai from the membership cache
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for
> > shutdown action on 2
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of
> > altai not matched
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs
> > -grace:0        (Started kaukasus)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs
> > -grace:1        (Started rnas2)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart kaukasus
> > -cluster_ip-1        (Started kaukasus)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start   altai
> > -cluster_ip-1        (kaukasus)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start   altai
> > -trigger_ip-1        (kaukasus)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart rnas2
> > -cluster_ip-1        (Started rnas2)
> > 	Nov 26 10:44:25 rnas2 pengine[17254]: notice: Calculated
> > Transition 85: /var/lib/pacemaker/pengine/pe-input-86.bz2
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 29: stop kaukasus-cluster_ip-1_stop_0 on kaukasus
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 35: start altai-trigger_ip-1_start_0 on kaukasus
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 37: stop rnas2-cluster_ip-1_stop_0 on rnas2 (local)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 36: monitor altai-trigger_ip-1_monitor_10000 on kaukasus
> > 	Nov 26 10:44:25 rnas2 IPaddr(rnas2-cluster_ip-1)[30797]: INFO:
> > IP status = ok, IP_CIP=
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation rnas2
> > -cluster_ip-1_stop_0: ok (node=rnas2, call=82, rc=0, cib
> > -update=210,
> > confirmed=true)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 21: stop nfs-grace_stop_0 on kaukasus
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 23: stop nfs-grace_stop_0 on rnas2 (local)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation nfs
> > -grace_stop_0: ok (node=rnas2, call=84, rc=0, cib-update=211,
> > confirmed=true)
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 22: start nfs-grace_start_0 on kaukasus
> > 	Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
> > 24: start nfs-grace_start_0 on rnas2 (local)
> > 	Nov 26 10:44:26 rnas2 ntpd[1700]: Deleting interface #27 bond0,
> > 129.132.145.23#123, interface stats: received=0, sent=0, dropped=0,
> > active_time=69258 secs
> > 	Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs-grace_start_0
> > process (PID 30810) timed out
> > 	Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs
> > -grace_start_0:30810 - timed out after 40000ms
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs
> > -grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms)
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs
> > -grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition aborted
> > by nfs-grace_start_0 'modify' on rnas2: Event failed
> > (magic=2:1;24:85:0:836713e1-c9d3-43f8-bffd
> > -756e023eee8a,...event:381,
> > 0)
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs
> > -grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs
> > -grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs
> > -grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 85
> > (Complete=13, Pending=0, Fired=0, Skipped=3, Incomplete=8,
> > Source=/var/lib/pacemaker/pengine/pe-input-86.bz2): Stopped
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:0 on kaukasus: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:0 on kaukasus: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:1 on rnas2: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:1 on rnas2: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Recover nfs
> > -grace:0        (Started kaukasus)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
> > -grace:1        (rnas2)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   kaukasus
> > -cluster_ip-1        (kaukasus)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   altai
> > -cluster_ip-1        (kaukasus)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   rnas2
> > -cluster_ip-1        (rnas2)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated
> > Transition 86: /var/lib/pacemaker/pengine/pe-input-87.bz2
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:0 on kaukasus: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:0 on kaukasus: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:1 on rnas2: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
> > failed op start for nfs-grace:1 on rnas2: unknown error (1)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from kaukasus after 1000000 failures
> > (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from kaukasus after 1000000 failures
> > (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from kaukasus after 1000000 failures
> > (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
> > -grace-clone away from rnas2 after 1000000 failures (max=1000000)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
> > -grace:0        (kaukasus)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
> > -grace:1        (rnas2)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   kaukasus
> > -cluster_ip-1        (kaukasus - blocked)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   altai
> > -cluster_ip-1        (kaukasus - blocked)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   rnas2
> > -cluster_ip-1        (rnas2 - blocked)
> > 	Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated
> > Transition 87: /var/lib/pacemaker/pengine/pe-input-88.bz2
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 2:
> > stop nfs-grace_stop_0 on kaukasus
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 6:
> > stop nfs-grace_stop_0 on rnas2 (local)
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Operation nfs
> > -grace_stop_0: ok (node=rnas2, call=86, rc=0, cib-update=218,
> > confirmed=true)
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 87
> > (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-88.bz2): Complete
> > 	Nov 26 10:45:05 rnas2 crmd[17255]: notice: State transition
> > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL
> > origin=notify_crmd ]
> > 
> > Yours,
> > Rigi
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 


More information about the Gluster-users mailing list