[Bugs] [Bug 1583937] Brick process crashed after upgrade from RHGS-3.3.1 async( 7.4) to RHGS-3.4(7.5)

bugzilla at redhat.com bugzilla at redhat.com
Wed May 30 03:52:23 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1583937



--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
Description of problem:
======================

Brick process crashed after upgrade from RHGS-3.3.1 async(7.4)  to
RHGS-3.4(7.5)

Version-Release number of selected component (if applicable):
------------------------------------------------------------
RHGS version:
------------
from version glusterfs-3.8.4-54.el7 to glusterfs-3.12.2-4.el7

OS version:
----------
from RHEL 7.4 to RHEL7.5

How reproducible:
----------------

Tried once, Only one node faced this issue out of 5 nodes in 6 node cluster

Steps to Reproduce:
------------------

1. Create 6 RHEL-7.4 machines.
2. Install RHGS-3.3.1 async build on RHEL-7.4 machines.
3. Then add firewall-services(glusterfs, nfs, rpc-bind) to all the cluster
servers
4. Then perform peer probe from one node to remaining all 5 servers.
5. Now all servers peer status is in connected state.
6. Create around 50 volumes which consisted of different topologies including
two-way distributed-replica volumes, three way distributed-replica volumes,
Arbitrated-replicate volumes, Distributed dispersed volumes.
7. Then mount 5 volumes to RHEL-7.4 client and 5 volumes to RHEL-7.5 client.
8. Kept 5 volumes in offline
9. Copy RHLE 7.5 repos and RHGS-3.4 repos into /etc/yum.repos.d
10. Stop glusterd, glusterfs, glusterfsd services of one node which is getting
upgrade.
11. Then perform yum update of that particular node.
12. After upgrade, upgraded node all bricks went to offline.
13. Core file generated in '/' directory with name of 'core.6282'
14.below is core details

*************************************************************************
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install
/usr/lib/debug/.build-id/66/a1ad12474aef1b8a3aac8363ef99e4c06ca5ab
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s 10.70.37.208 --volfile-id
arbtr_10.10.70.37.208.bricks-'.
Program terminated with signal 11, Segmentation fault.
#0  server_inode_new (itable=0x0, gfid=gfid at entry=0x7f1824022070 "") at
server-helpers.c:1314
1314                    return itable->root;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-18.el7.x86_64
libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64
libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-11.el7.x86_64
libgcc-4.8.5-28.el7.x86_64 libselinux-2.5-12.el7.x86_64
libuuid-2.23.2-52.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64
pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64
sssd-client-1.16.0-16.el7.x86_64 zlib-1.2.7-17.el7.x86_64

********************************************************************************
15. bt details

********************************************************************************
#0  server_inode_new (itable=0x0, gfid=gfid at entry=0x7f1824022070 "") at
server-helpers.c:1314
#1  0x00007f184cd1c13d in resolve_gfid (frame=frame at entry=0x7f182401fa30) at
server-resolve.c:205
#2  0x00007f184cd1d038 in server_resolve_inode
(frame=frame at entry=0x7f182401fa30)
    at server-resolve.c:418
a#3  0x00007f184cd1d2b0 in server_resolve (frame=0x7f182401fa30) at
server-resolve.c:559
#4  0x00007f184cd1c88e in server_resolve_all (frame=frame at entry=0x7f182401fa30)
    at server-resolve.c:611
#5  0x00007f184cd1d344 in resolve_and_resume (frame=frame at entry=0x7f182401fa30, 
    fn=fn at entry=0x7f184cd2a910 <server_getxattr_resume>) at
server-resolve.c:642
#6  0x00007f184cd3f638 in server3_3_getxattr (req=0x7f181c0132b0) at
server-rpc-fops.c:5121
#7  0x00007f1861c9a246 in rpcsvc_request_handler (arg=0x7f1850040c90) at
rpcsvc.c:1899
#8  0x00007f1860d37dd5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f1860600b3d in clone () from /lib64/libc.so.6

********************************************************************************

Note : Only one node faced this issue out of 5 nodes in 6 node cluster, for
first 4 nodes didn't face this issue,in 5th node upgrade seen this issue,still
one more node yet to upgrade 


Actual results:

    All bricks went to offline in upgraded node, and core found.

Expected results:

    All bricks should be in online , no cores should found

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list