[Bugs] [Bug 1400546] New: After ganesha node reboot/shutdown, portblock process goes to FAILED state
bugzilla at redhat.com
bugzilla at redhat.com
Thu Dec 1 13:20:22 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1400546
Bug ID: 1400546
Summary: After ganesha node reboot/shutdown, portblock process
goes to FAILED state
Product: GlusterFS
Version: 3.9
Component: common-ha
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: skoduri at redhat.com
CC: aloganat at redhat.com, amukherj at redhat.com,
bugs at gluster.org, jthottan at redhat.com,
kkeithle at redhat.com, oalbrigt at redhat.com,
rhs-bugs at redhat.com, skoduri at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1399154
Blocks: 1398261
+++ This bug was initially created as a clone of Bug #1399154 +++
Description of problem:
After ganesha node reboot, portblock process goes to FAILED state.
In a four node cluster, if one of the node gets rebooted/shutdown, portblock
process of any of the nodes(not particular node) are in FAILED state.
Even if the shutdown/rebooted node is brought up, failback is not happening if
the portblock process is in FAILED state.
Version-Release number of selected component (if applicable):
nfs-ganesha-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
How reproducible:
Consistent
Steps to Reproduce:
1. Create 4 node ganesha cluster.
2. Reboot one of the node
3. Check pcs status
Actual results:
portblock process goes to FAILED state in pcs status.
Expected results:
All the process should be up and running.
Additional info:
[root at dhcp46-139 ~]# pcs status
Cluster name: ganesha-ha-360
Stack: corosync
Current DC: dhcp46-124.lab.eng.blr.redhat.com (version
1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Thu Nov 24 13:12:49 2016 Last change: Thu Nov 24 12:32:19
2016 by root via cibadmin on dhcp46-111.lab.eng.blr.redhat.com
4 nodes and 24 resources configured
Online: [ dhcp46-115.lab.eng.blr.redhat.com dhcp46-124.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com ]
OFFLINE: [ dhcp46-111.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Started: [ dhcp46-115.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp46-115.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp46-115.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com dhcp46-139.lab.eng.blr.redhat.com ]
Stopped: [ dhcp46-111.lab.eng.blr.redhat.com ]
Resource Group: dhcp46-111.lab.eng.blr.redhat.com-group
dhcp46-111.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock):
Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):
Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock
(ocf::heartbeat:portblock): FAILED dhcp46-124.lab.eng.blr.redhat.com
(blocked)
Resource Group: dhcp46-115.lab.eng.blr.redhat.com-group
dhcp46-115.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock):
Started dhcp46-115.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):
Started dhcp46-115.lab.eng.blr.redhat.com
dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock
(ocf::heartbeat:portblock): Started dhcp46-115.lab.eng.blr.redhat.com
Resource Group: dhcp46-139.lab.eng.blr.redhat.com-group
dhcp46-139.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock):
Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):
Started dhcp46-139.lab.eng.blr.redhat.com
dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock
(ocf::heartbeat:portblock): Started dhcp46-139.lab.eng.blr.redhat.com
Resource Group: dhcp46-124.lab.eng.blr.redhat.com-group
dhcp46-124.lab.eng.blr.redhat.com-nfs_block (ocf::heartbeat:portblock):
Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):
Started dhcp46-124.lab.eng.blr.redhat.com
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock
(ocf::heartbeat:portblock): Started dhcp46-124.lab.eng.blr.redhat.com
Failed Actions:
* dhcp46-111.lab.eng.blr.redhat.com-nfs_unblock_stop_0 on
dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=83, status=Timed
Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=20004ms
* dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on
dhcp46-124.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed
Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms
* dhcp46-115.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on
dhcp46-115.lab.eng.blr.redhat.com 'unknown error' (1): call=73, status=Timed
Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:40 2016', queued=0ms, exec=0ms
* dhcp46-139.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000 on
dhcp46-139.lab.eng.blr.redhat.com 'unknown error' (1): call=71, status=Timed
Out, exitreason='none',
last-rc-change='Thu Nov 24 13:09:41 2016', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root at dhcp46-139 ~]#
ganesha log snippet:
---------------------
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0
records in ]
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0+0
records out ]
Nov 24 13:11:13 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15272:stderr [ 0
bytes (0 B) copied, 0.0739975 s, 0.0 kB/s ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0
records in ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0+0
records out ]
Nov 24 13:11:23 dhcp46-124 lrmd[25436]: notice:
dhcp46-124.lab.eng.blr.redhat.com-nfs_unblock_monitor_10000:15428:stderr [ 0
bytes (0 B) copied, 0.0539065 s, 0.0 kB/s ]
--- Additional comment from Worker Ant on 2016-11-28 07:40:45 EST ---
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for
portblock RA of action=unblock) posted (#1) for review on master by soumya k
(skoduri at redhat.com)
--- Additional comment from Worker Ant on 2016-11-28 07:53:30 EST ---
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for
portblock RA of action=unblock) posted (#2) for review on master by soumya k
(skoduri at redhat.com)
--- Additional comment from Worker Ant on 2016-11-29 11:46:23 EST ---
REVIEW: http://review.gluster.org/15947 (common-HA: Increase timeout for
portblock RA of action=unblock) posted (#3) for review on master by soumya k
(skoduri at redhat.com)
--- Additional comment from Worker Ant on 2016-12-01 05:46:37 EST ---
COMMIT: http://review.gluster.org/15947 committed in master by Kaleb KEITHLEY
(kkeithle at redhat.com)
------
commit 1b2b5be970f78cc32069516fa347d9943dc17d3e
Author: Soumya Koduri <skoduri at redhat.com>
Date: Mon Nov 28 17:56:35 2016 +0530
common-HA: Increase timeout for portblock RA of action=unblock
Portblock RA of action type unblock stores the information about
the client/server IPs connection in tickle_dir folder created in
the shared storage. In case of node shutdown/reboot there could be
cases wherein shared_storage may become unavailable for sometime.
Hence increase the timeout to avoid that resource agent going into
FAILED state.
Change-Id: I4f98f819895cb164c3a82ba8084c7c11610f35ff
BUG: 1399154
Signed-off-by: Soumya Koduri <skoduri at redhat.com>
Reviewed-on: http://review.gluster.org/15947
Smoke: Gluster Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle at redhat.com>
Reviewed-by: Niels de Vos <ndevos at redhat.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1398261
[Bug 1398261] After ganesha node reboot/shutdown, portblock process goes to
FAILED state
https://bugzilla.redhat.com/show_bug.cgi?id=1399154
[Bug 1399154] After ganesha node reboot/shutdown, portblock process goes to
FAILED state
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list