[Bugs] [Bug 1728127] New: [In-service] Post upgrade glusterd is crashing with a backtrace on the upgraded node while issuing gluster volume status from non-upgraded nodes

Tue Jul 9 05:25:05 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1728127

            Bug ID: 1728127
           Summary: [In-service] Post upgrade glusterd is crashing with a
                    backtrace on the upgraded node while issuing gluster
                    volume status from non-upgraded nodes
           Product: GlusterFS
           Version: 7
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Severity: high
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: srakonde at redhat.com
                CC: amukherj at redhat.com, bmekala at redhat.com,
                    bugs at gluster.org, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, srakonde at redhat.com,
                    storage-qa-internal at redhat.com, vbellur at redhat.com
        Depends On: 1723658
            Blocks: 1722131, 1728126
  Target Milestone: ---
    Classification: Community

+++ This bug was initially created as a clone of Bug #1723658 +++

+++ This bug was initially created as a clone of Bug #1722131 +++

Description of problem:
During In-service upgrade, glusterd on upgraded node crashed with a backtrace,
when 'gluster vol status' command is issued from non-upgraded nodes.
Upgrade scenario is from glusterfs-5 or lower to glusterfs-6

Version-Release number of selected component (if applicable):
glusterfs 5 to glusterfs 6 upgrade

How reproducible:
3/3

Steps to Reproduce:
1. On three nodes cluster(N1, N2, N3), Create 2020 volumes of replicate(1X3)
and started them (Brick-mux enabled)
2. Mounted 3 volumes and running continuous IO from 3 different clients.
3. Upgraded node N1.
4. While heal is going on node N1, Ran 'gluster volume status' on node N2 which
is yet to upgrade.

Actual results:
glusterd crashed with a backtrace no backtrace seen. 

[2019-06-19 11:13:56.506826] I [MSGID: 106499]
[glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume testvol_-997
[2019-06-19 11:13:56.512662] I [MSGID: 106499]
[glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume testvol_-998
[2019-06-19 11:13:56.518409] I [MSGID: 106499]
[glusterd-handler.c:4497:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume testvol_-999
[2019-06-19 11:14:37.732442] E [MSGID: 101005]
[dict.c:2852:dict_serialized_length_lk] 0-dict: value->len (-1162167622) < 0
[Invalid argument]
[2019-06-19 11:14:37.732483] E [MSGID: 106130]
[glusterd-handler.c:2633:glusterd_op_commit_send_resp] 0-management: failed to
get serialized length of dict
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-06-19 11:14:37
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27240)[0x7f7b5c38a240]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f7b5c394c64]
/lib64/libc.so.6(+0x363f0)[0x7f7b5a9c63f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f7b5b1cad00]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7f7b5c3b64cc]
/lib64/libglusterfs.so.0(+0x1b889)[0x7f7b5c37e889]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x478f8)[0x7f7b504c58f8]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x44514)[0x7f7b504c2514]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d19e)[0x7f7b5049b19e]
/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24dce)[0x7f7b504a2dce]
/lib64/libglusterfs.so.0(+0x66610)[0x7f7b5c3c9610]
/lib64/libc.so.6(+0x48180)[0x7f7b5a9d8180]
---------

Expected results:
glusterd should not crash

--- Additional comment from Worker Ant on 2019-06-25 11:22:12 IST ---

REVIEW: https://review.gluster.org/22939 (glusterd: conditionally clear
txn_opinfo in stage op) posted (#1) for review on master by Atin Mukherjee

--- Additional comment from Atin Mukherjee on 2019-06-25 11:24:18 IST ---

Root cause :

So on a heterogeneous cluster mode, the above patch would end up clearing the
txn_opinfo during the staging phase, but since the originator node is still
running older versions it will initiate a commit op which would result in
commit op accessing a freed up txn opinfo.

--- Additional comment from Worker Ant on 2019-06-25 17:45:50 IST ---

REVIEW: https://review.gluster.org/22939 (glusterd: conditionally clear
txn_opinfo in stage op) merged (#2) on master by Atin Mukherjee

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1722131
[Bug 1722131] [In-service] Post upgrade glusterd is crashing with a backtrace
on the upgraded node while issuing gluster volume status from non-upgraded
nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1723658
[Bug 1723658] [In-service] Post upgrade glusterd is crashing with a backtrace
on the upgraded node while issuing gluster volume status from non-upgraded
nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1728126
[Bug 1728126] [In-service] Post upgrade glusterd is crashing with a backtrace
on the upgraded node while issuing gluster volume status from non-upgraded
nodes
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.