[Bugs] [Bug 1399482] New: [Eventing]: Events not seen when command is triggered from one of the peer nodes
bugzilla at redhat.com
bugzilla at redhat.com
Tue Nov 29 07:06:12 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1399482
Bug ID: 1399482
Summary: [Eventing]: Events not seen when command is triggered
from one of the peer nodes
Product: GlusterFS
Version: 3.9
Component: eventsapi
Severity: high
Assignee: bugs at gluster.org
Reporter: avishwan at redhat.com
CC: amukherj at redhat.com, avishwan at redhat.com,
sanandpa at redhat.com, storage-qa-internal at redhat.com,
vbellur at redhat.com
Depends On: 1384316, 1388862
+++ This bug was initially created as a clone of Bug #1388862 +++
+++ This bug was initially created as a clone of Bug #1384316 +++
Description of problem:
=======================
Have a 4 node cluster, with eventing enabled. Login to N3's console/terminal,
and create a distribute volume of 2 bricks residing on N1 and N2, and another
distribute replicate volume 1*2, again residing on N1 and N2. Execute bitrot
related commands and monitor the events that are seen. The bitrot commands when
triggered from N1, N2, N3 successfully generate an event, however any command
that is executed on N4 results in no events.
How reproducible:
================
Seeing it across 2 volumes in the present setup
Steps to Reproduce:
===================
1. Have a 4 node cluster,enable eventing.
2. Login to N3, create 'dist' with B1 of N1 and B2 of N2. Create another volume
'distrep' 1*2 , with B2 of N1 and B2 of N2.
3. Enable bitrot and play around the scrub options from the console of
N1/N2/N3.
4. Login to N4, and execute the same commands in step3 on either of the volumes
'dist' or 'distrep'
Actual results:
==============
Events seen as expected in step3. NO events seen in step4
Expected results:
=================
Events should be seen irrespective of the peer from which the command is
executed.
Additional info:
================
----
N2
----
[root at dhcp35-100 ~]# gluster v bitrot distrep scrub-throttle normal
volume bitrot: success
[root at dhcp35-100 ~]# gluster v bitrot dist scrub-frequency weekly
volume bitrot: success
[root at dhcp35-100 ~]#
{u'message': {u'name': u'distrep', u'value': u'normal'}, u'event':
u'BITROT_SCRUB_THROTTLE', u'ts': 1476336724, u'nodeid':
u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'}
{u'message': {u'name': u'dist', u'value': u'weekly'}, u'event':
u'BITROT_SCRUB_FREQ', u'ts': 1476336747, u'nodeid':
u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'}
======================================================================================================================
-----
N1
-----
[root at dhcp35-115 ~]# gluster v bitrot dist scrub pause
volume bitrot: success
[root at dhcp35-115 ~]#
{u'message': {u'name': u'dist', u'value': u'pause'}, u'event':
u'BITROT_SCRUB_OPTION', u'ts': 1476336842, u'nodeid':
u'6ac165c0-317f-42ad-8262-953995171dbb'}
======================================================================================================================
-----
N3
-----
[root at dhcp35-101 ~]# gluster v bitrot dist scrub resume
volume bitrot: success
[root at dhcp35-101 ~]#
{u'message': {u'name': u'dist', u'value': u'resume'}, u'event':
u'BITROT_SCRUB_OPTION', u'ts': 1476336858, u'nodeid':
u'a3bd23b9-f70a-47f5-9c95-7a271f5f1e18'}
======================================================================================================================
----
N4
----
[root at dhcp35-104 ~]# gluster v bitrot dist scrub pause
volume bitrot: success
[root at dhcp35-104 ~]# gluster v bitrot distrep scrub pause
volume bitrot: success
[root at dhcp35-104 ~]#
[root at dhcp35-104 ~]# gluster v bitrot dist scrub resume
volume bitrot: success
[root at dhcp35-104 ~]#
[root at dhcp35-104 ~]# gluster v bitrot dist scrub status
Volume name : dist
State of scrub: Active (Idle)
Scrub impact: aggressive
Scrub frequency: weekly
Bitrot error log location: /var/log/glusterfs/bitd.log
Scrubber error log location: /var/log/glusterfs/scrub.log
=========================================================
Node: dhcp35-115.lab.eng.blr.redhat.com
Number of Scrubbed files: 0
Number of Skipped files: 0
Last completed scrub time: 2016-10-13 05:32:47
Duration of last scrub (D:M:H:M:S): 0:0:0:0
Error count: 0
=========================================================
Node: 10.70.35.100
Number of Scrubbed files: 0
Number of Skipped files: 0
Last completed scrub time: 2016-10-13 05:32:45
Duration of last scrub (D:M:H:M:S): 0:0:0:0
Error count: 0
=========================================================
[root at dhcp35-104 ~]#
<<<<<<<<< No events seen >>>>>>>>
[root at dhcp35-104 ~]# gluster-eventsapi status
Webhooks:
http://10.70.35.109:9000/listen
+-----------------------------------+-------------+-----------------------+
| NODE | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
| dhcp35-115.lab.eng.blr.redhat.com | UP | UP |
| dhcp35-101.lab.eng.blr.redhat.com | UP | UP |
| 10.70.35.100 | UP | UP |
| localhost | UP | UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-104 ~]# gluster peer tsatus
unrecognized word: tsatus (position 1)
[root at dhcp35-104 ~]# gluster peer status
Number of Peers: 3
Hostname: dhcp35-115.lab.eng.blr.redhat.com
Uuid: 6ac165c0-317f-42ad-8262-953995171dbb
State: Peer in Cluster (Connected)
Hostname: dhcp35-101.lab.eng.blr.redhat.com
Uuid: a3bd23b9-f70a-47f5-9c95-7a271f5f1e18
State: Peer in Cluster (Connected)
Hostname: 10.70.35.100
Uuid: fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5
State: Peer in Cluster (Connected)
[root at dhcp35-104 ~]#
[root at dhcp35-104 ~]#
[root at dhcp35-104 ~]#
--- Additional comment from Sweta Anandpara on 2016-10-13 02:55:44 EDT ---
Added debuginfo package, and Atin figured out that the event IS actually being
sent.
Did a glustereventsd reload on the affected node N4, and started receiving
events. node-reload is one of the program called when we do a webhook-add,
which would in turn do a glustereventsd reload. For some reason when I did a
webhook add in this setup, glustereventsd reload would have failed. Just a
hypothesis as of now.
Will create a new webhook and add it in this same setup. Will observe the
success/failure/errors seen while doing so, and will update.
Until then anyone seeing similar issue can do a work around of 'service
glustereventsd reload' on the impacted node, and the cluster and its events
should work as expected.
--- Additional comment from Sweta Anandpara on 2016-10-13 03:38:34 EDT ---
Deleted the said webhook, and tried to add the same webhook again to the
cluster. That did show up an exception where it failed to run 'gluster system::
execute eventsapi.py node-reload'
It fails in the same node N4 everytime, and I am unable to figure out the
reason why. It works on all the other nodes of the cluster.
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-del
http://10.70.35.109:9000/listen
Traceback (most recent call last):
File "/usr/sbin/gluster-eventsapi", line 459, in <module>
runcli()
File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
212, in runcli
cls.run(args)
File "/usr/sbin/gluster-eventsapi", line 274, in run
sync_to_peers()
File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers
out = execute_in_peers("node-reload")
File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
125, in execute_in_peers
raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on
10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system::
execute eventsapi.py node-reload')
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi status
Webhooks: None
+-----------------------------------+-------------+-----------------------+
| NODE | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
| 10.70.35.100 | UP | UP |
| 10.70.35.104 | UP | UP |
| dhcp35-115.lab.eng.blr.redhat.com | UP | UP |
| localhost | UP | UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-test
http://10.70.35.109:9000/listen
+-----------------------------------+-------------+----------------+
| NODE | NODE STATUS | WEBHOOK STATUS |
+-----------------------------------+-------------+----------------+
| 10.70.35.100 | UP | OK |
| 10.70.35.104 | UP | OK |
| dhcp35-115.lab.eng.blr.redhat.com | UP | OK |
| localhost | UP | OK |
+-----------------------------------+-------------+----------------+
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-add
http://10.70.35.109:9000/listen
Traceback (most recent call last):
File "/usr/sbin/gluster-eventsapi", line 459, in <module>
runcli()
File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
212, in runcli
cls.run(args)
File "/usr/sbin/gluster-eventsapi", line 232, in run
sync_to_peers()
File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers
out = execute_in_peers("node-reload")
File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
125, in execute_in_peers
raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on
10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system::
execute eventsapi.py node-reload')
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi status
Webhooks:
http://10.70.35.109:9000/listen
+-----------------------------------+-------------+-----------------------+
| NODE | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
| 10.70.35.100 | UP | UP |
| 10.70.35.104 | UP | UP |
| dhcp35-115.lab.eng.blr.redhat.com | UP | UP |
| localhost | UP | UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-101 yum.repos.d]#
[root at dhcp35-101 yum.repos.d]#
--- Additional comment from Aravinda VK on 2016-10-17 08:36:14 EDT ---
This is similar to BZ 1379963. `glustereventsd` on one node is not reloaded and
it doesn't know the information about new Webhook added.
--- Additional comment from Worker Ant on 2016-10-26 06:29:03 EDT ---
REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data
when modified) posted (#1) for review on master by Aravinda VK
(avishwan at redhat.com)
--- Additional comment from Worker Ant on 2016-11-17 01:28:21 EST ---
REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data
when modified) posted (#2) for review on master by Aravinda VK
(avishwan at redhat.com)
--- Additional comment from Worker Ant on 2016-11-17 06:10:31 EST ---
COMMIT: http://review.gluster.org/15731 committed in master by Aravinda VK
(avishwan at redhat.com)
------
commit b7ebffbda9ba784ccfae6d1a90766d5310cdaa15
Author: Aravinda VK <avishwan at redhat.com>
Date: Wed Oct 26 15:51:17 2016 +0530
eventsapi: Auto reload Webhooks data when modified
glustereventsd depends on reload signal to reload the
Webhooks configurations. But if reload signal missed, no
events will be sent to newly added Webhook.
Added auto reload based on webhooks file mtime. Before pushing
events to Webhooks, reloads webhooks configurations if previously
recorded mtime is different than current mtime.
BUG: 1388862
Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158
Signed-off-by: Aravinda VK <avishwan at redhat.com>
Reviewed-on: http://review.gluster.org/15731
Reviewed-by: Prashanth Pai <ppai at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1384316
[Bug 1384316] [Eventing]: Events not seen when command is triggered from
one of the peer nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1388862
[Bug 1388862] [Eventing]: Events not seen when command is triggered from
one of the peer nodes
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the Bugs
mailing list