[Bugs] [Bug 1310999] New: [GlusterD]: GlusterD log is filled with error messages - " Failed to aggregate response from node/brick"

Tue Feb 23 07:22:53 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1310999

            Bug ID: 1310999
           Summary: [GlusterD]: GlusterD log is filled with error messages
                    - " Failed to aggregate response from  node/brick"
           Product: GlusterFS
           Version: 3.7.8
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: amukherj at redhat.com, bsrirama at redhat.com,
                    bugs at gluster.org, nicolas at ecarnot.net,
                    nlevinki at redhat.com, rhs-bugs at redhat.com,
                    sasundar at redhat.com, storage-qa-internal at redhat.com,
                    vbellur at redhat.com
        Depends On: 1290653, 1290734

+++ This bug was initially created as a clone of Bug #1290734 +++

+++ This bug was initially created as a clone of Bug #1290653 +++

Description of problem:
=======================
Created a VM in RHEVM Env using gluster volume as a storage and observed below 
errors in the  glusterd logs

<<<<<<<<<<<<<<<<<<<<GlusterD Log>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

[2015-12-11 04:41:40.000010] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:41:40.004306] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:42:40.657420] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:42:40.660675] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:43:37.847553] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:43:41.302213] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:44:41.960021] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
The message "I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs" repeated 3 times between
[2015-12-11 04:43:37.847553] and [2015-12-11 04:44:41.963542]
[2015-12-11 04:45:42.634852] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:45:42.640099] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:46:43.297719] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:46:43.301371] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:47:43.956339] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:47:43.959903] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs
[2015-12-11 04:51:46.542435] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick
[2015-12-11 04:51:46.546767] I [MSGID: 106499]
[glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume vol_brs

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<End>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Version-Release number of selected component (if applicable):
mainline

How reproducible:
=================
Everytime

Steps to Reproduce:
===================
1. Have two nodes
2. Create a Distributed-Replica volume and do fuse mount on client
5. gluster volume status all tasks

Actual results:
===============
Errors in glusterd logs 
[2015-12-11 04:51:46.542435] E [MSGID: 106108]
[glusterd-syncop.c:1072:_gd_syncop_commit_op_cbk] 0-management: Failed to
aggregate response from  node/brick

Expected results:
=================
Above mentioned error should not come in glusterd log.

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-12-10
23:40:31 EST ---

This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Atin Mukherjee on 2015-12-11 00:34:31 EST ---

I don't think this has anything to do with RHEVM setup. I remember seeing this
log in some set up and on further analysis I figured out that we have
inadequate logs for this path to get to the actual reason of the failure, we
need to improve the logging part here and you can expect a patch to be coming
soon in upstream. However, if I try to set up a two node cluster and create a
volume and run volume status I don't see this log. 

Do you have a reproducer for this?

--- Additional comment from Vijay Bellur on 2015-12-11 04:46:33 EST ---

REVIEW: http://review.gluster.org/12950 (glusterd: correct ret code in
glusterd_volume_status_copy_to_op_ctx_dict) posted (#1) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-14 00:53:49 EST ---

REVIEW: http://review.gluster.org/12950 (glusterd: correct ret code in
glusterd_volume_status_copy_to_op_ctx_dict) posted (#2) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-25 00:04:43 EST ---

COMMIT: http://review.gluster.org/12950 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit 88bf33555371ae01dd297aecf8666d7121309b80
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Fri Dec 11 15:15:53 2015 +0530

    glusterd: correct ret code in glusterd_volume_status_copy_to_op_ctx_dict

    This patch is to supress the error log of Failed to aggregate rsp_dict
where the
    above function returns a non zero ret which is not required

    Change-Id: If331980291bd369690257215333cea175e2042ec
    BUG: 1290734
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: http://review.gluster.org/12950
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Gaurav Kumar Garg <ggarg at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1290653
[Bug 1290653] [GlusterD]: GlusterD log is filled with error messages - "
Failed to aggregate response from  node/brick"
https://bugzilla.redhat.com/show_bug.cgi?id=1290734
[Bug 1290734] [GlusterD]: GlusterD log is filled with error messages - "
Failed to aggregate response from  node/brick"
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.