[Bugs] [Bug 1399482] New: [Eventing]: Events not seen when command is triggered from one of the peer nodes

Tue Nov 29 07:06:12 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1399482

            Bug ID: 1399482
           Summary: [Eventing]: Events not seen when command is triggered
                    from one of the peer nodes
           Product: GlusterFS
           Version: 3.9
         Component: eventsapi
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: avishwan at redhat.com
                CC: amukherj at redhat.com, avishwan at redhat.com,
                    sanandpa at redhat.com, storage-qa-internal at redhat.com,
                    vbellur at redhat.com
        Depends On: 1384316, 1388862

+++ This bug was initially created as a clone of Bug #1388862 +++

+++ This bug was initially created as a clone of Bug #1384316 +++

Description of problem:
=======================
Have a 4 node cluster, with eventing enabled. Login to N3's console/terminal,
and create a distribute volume of 2 bricks residing on N1 and N2, and another
distribute replicate volume 1*2, again residing on N1 and N2. Execute bitrot
related commands and monitor the events that are seen. The bitrot commands when
triggered from N1, N2, N3 successfully generate an event, however any command
that is executed on N4 results in no events. 

How reproducible:
================
Seeing it across 2 volumes in the present setup

Steps to Reproduce:
===================
1. Have a 4 node cluster,enable eventing.
2. Login to N3, create 'dist' with B1 of N1 and B2 of N2. Create another volume
'distrep' 1*2 , with B2 of N1 and B2 of N2.
3. Enable bitrot and play around the scrub options from the console of
N1/N2/N3.
4. Login to N4, and execute the same commands in step3 on either of the volumes
'dist' or 'distrep'

Actual results:
==============
Events seen as expected in step3. NO events seen in step4

Expected results:
=================
Events should be seen irrespective of the peer from which the command is
executed.

Additional info:
================

----
N2 
----

[root at dhcp35-100 ~]# gluster v bitrot distrep scrub-throttle normal
volume bitrot: success
[root at dhcp35-100 ~]# gluster v bitrot dist scrub-frequency weekly
volume bitrot: success
[root at dhcp35-100 ~]#

{u'message': {u'name': u'distrep', u'value': u'normal'}, u'event':
u'BITROT_SCRUB_THROTTLE', u'ts': 1476336724, u'nodeid':
u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'}

{u'message': {u'name': u'dist', u'value': u'weekly'}, u'event':
u'BITROT_SCRUB_FREQ', u'ts': 1476336747, u'nodeid':
u'fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5'}

======================================================================================================================

-----
N1
-----
[root at dhcp35-115 ~]# gluster v bitrot dist scrub pause
volume bitrot: success
[root at dhcp35-115 ~]# 

{u'message': {u'name': u'dist', u'value': u'pause'}, u'event':
u'BITROT_SCRUB_OPTION', u'ts': 1476336842, u'nodeid':
u'6ac165c0-317f-42ad-8262-953995171dbb'}

======================================================================================================================

-----
N3
-----
[root at dhcp35-101 ~]# gluster v bitrot dist scrub resume
volume bitrot: success
[root at dhcp35-101 ~]#

{u'message': {u'name': u'dist', u'value': u'resume'}, u'event':
u'BITROT_SCRUB_OPTION', u'ts': 1476336858, u'nodeid':
u'a3bd23b9-f70a-47f5-9c95-7a271f5f1e18'}

======================================================================================================================

----
N4
----
[root at dhcp35-104 ~]# gluster v bitrot dist scrub pause
volume bitrot: success
[root at dhcp35-104 ~]# gluster v bitrot distrep scrub pause
volume bitrot: success
[root at dhcp35-104 ~]#
[root at dhcp35-104 ~]# gluster v bitrot dist scrub resume
volume bitrot: success
[root at dhcp35-104 ~]# 
[root at dhcp35-104 ~]# gluster v bitrot dist scrub status

Volume name : dist

State of scrub: Active (Idle)

Scrub impact: aggressive

Scrub frequency: weekly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log

=========================================================

Node: dhcp35-115.lab.eng.blr.redhat.com

Number of Scrubbed files: 0

Number of Skipped files: 0

Last completed scrub time: 2016-10-13 05:32:47

Duration of last scrub (D:M:H:M:S): 0:0:0:0

Error count: 0

=========================================================

Node: 10.70.35.100

Number of Scrubbed files: 0

Number of Skipped files: 0

Last completed scrub time: 2016-10-13 05:32:45

Duration of last scrub (D:M:H:M:S): 0:0:0:0

Error count: 0

=========================================================

[root at dhcp35-104 ~]# 

<<<<<<<<<          No events seen         >>>>>>>>

[root at dhcp35-104 ~]# gluster-eventsapi status
Webhooks: 
http://10.70.35.109:9000/listen

+-----------------------------------+-------------+-----------------------+
|                NODE               | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
| dhcp35-115.lab.eng.blr.redhat.com |          UP |                    UP |
| dhcp35-101.lab.eng.blr.redhat.com |          UP |                    UP |
|            10.70.35.100           |          UP |                    UP |
|             localhost             |          UP |                    UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-104 ~]# gluster peer tsatus
unrecognized word: tsatus (position 1)
[root at dhcp35-104 ~]# gluster peer status
Number of Peers: 3

Hostname: dhcp35-115.lab.eng.blr.redhat.com
Uuid: 6ac165c0-317f-42ad-8262-953995171dbb
State: Peer in Cluster (Connected)

Hostname: dhcp35-101.lab.eng.blr.redhat.com
Uuid: a3bd23b9-f70a-47f5-9c95-7a271f5f1e18
State: Peer in Cluster (Connected)

Hostname: 10.70.35.100
Uuid: fcfacf2e-57fb-45ba-b1e1-e4ba640a4de5
State: Peer in Cluster (Connected)
[root at dhcp35-104 ~]# 
[root at dhcp35-104 ~]# 
[root at dhcp35-104 ~]#

--- Additional comment from Sweta Anandpara on 2016-10-13 02:55:44 EDT ---

Added debuginfo package, and Atin figured out that the event IS actually being
sent. 

Did a glustereventsd reload on the affected node N4, and started receiving
events. node-reload is one of the program called when we do a webhook-add,
which would in turn do a glustereventsd reload. For some reason when I did a
webhook add in this setup, glustereventsd reload would have failed. Just a
hypothesis as of now. 

Will create a new webhook and add it in this same setup. Will observe the
success/failure/errors seen while doing so, and will update. 

Until then anyone seeing similar issue can do a work around of 'service
glustereventsd reload' on the impacted node, and the cluster and its events
should work as expected.

--- Additional comment from Sweta Anandpara on 2016-10-13 03:38:34 EDT ---

Deleted the said webhook, and tried to add the same webhook again to the
cluster. That did show up an exception where it failed to run 'gluster system::
execute eventsapi.py node-reload'

It fails in the same node N4 everytime, and I am unable to figure out the
reason why. It works on all the other nodes of the cluster.

[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-del
http://10.70.35.109:9000/listen
Traceback (most recent call last):
  File "/usr/sbin/gluster-eventsapi", line 459, in <module>
    runcli()
  File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
212, in runcli
    cls.run(args)
  File "/usr/sbin/gluster-eventsapi", line 274, in run
    sync_to_peers()
  File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers
    out = execute_in_peers("node-reload")
  File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
125, in execute_in_peers
    raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on
10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system::
execute eventsapi.py node-reload')
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi status
Webhooks: None

+-----------------------------------+-------------+-----------------------+
|                NODE               | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
|            10.70.35.100           |          UP |                    UP |
|            10.70.35.104           |          UP |                    UP |
| dhcp35-115.lab.eng.blr.redhat.com |          UP |                    UP |
|             localhost             |          UP |                    UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-test
http://10.70.35.109:9000/listen
+-----------------------------------+-------------+----------------+
|                NODE               | NODE STATUS | WEBHOOK STATUS |
+-----------------------------------+-------------+----------------+
|            10.70.35.100           |          UP |             OK |
|            10.70.35.104           |          UP |             OK |
| dhcp35-115.lab.eng.blr.redhat.com |          UP |             OK |
|             localhost             |          UP |             OK |
+-----------------------------------+-------------+----------------+
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi webhook-add
http://10.70.35.109:9000/listen
Traceback (most recent call last):
  File "/usr/sbin/gluster-eventsapi", line 459, in <module>
    runcli()
  File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
212, in runcli
    cls.run(args)
  File "/usr/sbin/gluster-eventsapi", line 232, in run
    sync_to_peers()
  File "/usr/sbin/gluster-eventsapi", line 129, in sync_to_peers
    out = execute_in_peers("node-reload")
  File "/usr/lib/python2.6/site-packages/gluster/cliutils/cliutils.py", line
125, in execute_in_peers
    raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on
10.70.35.104. Error: Unable to end. Error : Success\n', 'gluster system::
execute eventsapi.py node-reload')
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]# gluster-eventsapi status
Webhooks: 
http://10.70.35.109:9000/listen

+-----------------------------------+-------------+-----------------------+
|                NODE               | NODE STATUS | GLUSTEREVENTSD STATUS |
+-----------------------------------+-------------+-----------------------+
|            10.70.35.100           |          UP |                    UP |
|            10.70.35.104           |          UP |                    UP |
| dhcp35-115.lab.eng.blr.redhat.com |          UP |                    UP |
|             localhost             |          UP |                    UP |
+-----------------------------------+-------------+-----------------------+
[root at dhcp35-101 yum.repos.d]# 
[root at dhcp35-101 yum.repos.d]#

--- Additional comment from Aravinda VK on 2016-10-17 08:36:14 EDT ---

This is similar to BZ 1379963. `glustereventsd` on one node is not reloaded and
it doesn't know the information about new Webhook added.

--- Additional comment from Worker Ant on 2016-10-26 06:29:03 EDT ---

REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data
when modified) posted (#1) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Worker Ant on 2016-11-17 01:28:21 EST ---

REVIEW: http://review.gluster.org/15731 (eventsapi: Auto reload Webhooks data
when modified) posted (#2) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Worker Ant on 2016-11-17 06:10:31 EST ---

COMMIT: http://review.gluster.org/15731 committed in master by Aravinda VK
(avishwan at redhat.com) 
------
commit b7ebffbda9ba784ccfae6d1a90766d5310cdaa15
Author: Aravinda VK <avishwan at redhat.com>
Date:   Wed Oct 26 15:51:17 2016 +0530

    eventsapi: Auto reload Webhooks data when modified

    glustereventsd depends on reload signal to reload the
    Webhooks configurations. But if reload signal missed, no
    events will be sent to newly added Webhook.

    Added auto reload based on webhooks file mtime. Before pushing
    events to Webhooks, reloads webhooks configurations if previously
    recorded mtime is different than current mtime.

    BUG: 1388862
    Change-Id: I83a41d6a52d8fa1d70e88294298f4a5c396d4158
    Signed-off-by: Aravinda VK <avishwan at redhat.com>
    Reviewed-on: http://review.gluster.org/15731
    Reviewed-by: Prashanth Pai <ppai at redhat.com>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1384316
[Bug 1384316] [Eventing]: Events not seen when command is triggered from
one of the peer nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1388862
[Bug 1388862] [Eventing]: Events not seen when command is triggered from
one of the peer nodes
-- 
You are receiving this mail because:
You are the assignee for the bug.