[Bugs] [Bug 1710159] New: glusterd: While upgrading (3-node cluster) 'gluster v status' times out on node to be upgraded

bugzilla at redhat.com bugzilla at redhat.com
Wed May 15 02:47:53 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1710159

            Bug ID: 1710159
           Summary: glusterd: While upgrading (3-node cluster) 'gluster v
                    status' times out on node to be upgraded
           Product: GlusterFS
           Version: mainline
            Status: NEW
         Component: glusterd
          Keywords: Regression
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: srakonde at redhat.com
                CC: amukherj at redhat.com, bmekala at redhat.com,
                    bugs at gluster.org, rallan at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, storage-qa-internal at redhat.com,
                    vbellur at redhat.com
        Depends On: 1707246
            Blocks: 1696807
  Target Milestone: ---
    Classification: Community



Description of problem:
=======================
While performing an in-service upgrade (for geo-rep) of a 3 node cluster, 
2 nodes (from the master) upgraded successfully .
The third node which was on a previous build (3.4.4), while running 'gluster v
status' it times out.

[root at dhcp41-155]# gluster v status
Error : Request timed out

Other gluster commands seem to be working on this node.

The same command on the upgraded nodes work as expected: (as shown)
[root at dhcp42-173 glusterfs]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.211:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       9661 
Brick 10.70.41.155:/var/lib/glusterd/ss_bri
ck                                          49155     0          Y       16101
Brick dhcp42-173.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49152     0          Y       4718 
Self-heal Daemon on localhost               N/A       N/A        Y       4809 
Self-heal Daemon on 10.70.41.155            N/A       N/A        Y       16524
Self-heal Daemon on 10.70.42.211            N/A       N/A        Y       9964 

Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: master
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.173:/rhs/brick1/b1           49153     0          Y       4748 
Brick 10.70.42.211:/rhs/brick1/b2           49153     0          Y       9698 
Brick 10.70.41.155:/rhs/brick1/b3           49152     0          Y       15867
Brick 10.70.42.173:/rhs/brick2/b4           49154     0          Y       4757 
Brick 10.70.42.211:/rhs/brick2/b5           49154     0          Y       9707 
Brick 10.70.41.155:/rhs/brick2/b6           49153     0          Y       15888
Brick 10.70.42.173:/rhs/brick3/b7           49155     0          Y       4764 
Brick 10.70.42.211:/rhs/brick3/b8           49155     0          Y       9722 
Brick 10.70.41.155:/rhs/brick3/b9           49154     0          Y       15909
Self-heal Daemon on localhost               N/A       N/A        Y       4809 
Self-heal Daemon on 10.70.42.211            N/A       N/A        Y       9964 
Self-heal Daemon on 10.70.41.155            N/A       N/A        Y       16524

Task Status of Volume master




Error messages in glusterd log (on the 3.4.4 node) are as follows:
------------------------------------------------------------------
[2019-05-07 06:29:06.143535] E [rpc-clnt.c:185:call_bail] 0-management: bailing
out frame type(glusterd mgmt) op(--(4)) xid = 0x19 sent = 2019-05-07
06:18:58.538764. timeout = 600 for 10.70.42.173:24007
[2019-05-07 06:29:06.143630] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
dhcp42-173.lab.eng.blr.redhat.com. Please check log file for details.
[2019-05-07 06:29:06.144183] I [socket.c:3699:socket_submit_reply]
0-socket.management: not connected (priv->connected = -1)
[2019-05-07 06:29:06.144234] E [rpcsvc.c:1573:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli,
ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2019-05-07 06:29:06.144279] E [MSGID: 106430]
[glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission
failed
[2019-05-07 06:42:46.327616] E [rpc-clnt.c:185:call_bail] 0-management: bailing
out frame type(glusterd mgmt) op(--(4)) xid = 0x1b sent = 2019-05-07
06:32:44.342818. timeout = 600 for 10.70.42.173:24007
[2019-05-07 06:42:46.327901] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
dhcp42-173.lab.eng.blr.redhat.com. Please check log file for details.
[2019-05-07 06:42:50.328686] E [rpc-clnt.c:185:call_bail] 0-management: bailing
out frame type(glusterd mgmt) op(--(4)) xid = 0x1a sent = 2019-05-07
06:32:44.342952. timeout = 600 for 10.70.42.211:24007
[2019-05-07 06:42:50.328839] E [MSGID: 106153]
[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
10.70.42.211. Please check log file for details.
[2019-05-07 06:42:50.329321] I [socket.c:3699:socket_submit_reply]
0-socket.management: not connected (priv->connected = -1)
[2019-05-07 06:42:50.329356] E [rpcsvc.c:1573:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli,
ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2019-05-07 06:42:50.329402] E [MSGID: 106430]
[glusterd-utils.c:560:glusterd_submit_reply] 0-glusterd: Reply submission
failed


On the upgraded node: (glusterd log)
----------------------
[2019-05-07 06:18:58.535711] E [glusterd-op-sm.c:8193:glusterd_op_sm]
(-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24c8e) [0x7f6ae684ec8e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d05e) [0x7f6ae684705e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x444ff) [0x7f6ae686e4ff]
) 0-management: Unable to get transaction opinfo for transaction ID
:bc0a5ca5-f3a1-4c27-b263-d2d34289cbe3
[2019-05-07 06:32:44.339866] E [glusterd-op-sm.c:8193:glusterd_op_sm]
(-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24c8e) [0x7f6ae684ec8e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d05e) [0x7f6ae684705e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x444ff) [0x7f6ae686e4ff]
) 0-management: Unable to get transaction opinfo for transaction ID
:949c5d20-fb42-4d4d-8175-3e3dc158d310
[2019-05-07 06:43:02.694136] E [glusterd-op-sm.c:8193:glusterd_op_sm]
(-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x24c8e) [0x7f6ae684ec8e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x1d05e) [0x7f6ae684705e]
-->/usr/lib64/glusterfs/6.0/xlator/mgmt/glusterd.so(+0x444ff) [0x7f6ae686e4ff]
) 0-management: Unable to get transaction opinfo for transaction ID
:d150e923-3749-439a-ae3a-82ce997c8608




Version-Release number of selected component (if applicable):
==============================================================

UPGRADED NODE ==> [root at dhcp42-173]# rpm -qa | grep gluster
python2-gluster-6.0-2.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.7.x86_64
glusterfs-cli-6.0-2.el7rhgs.x86_64
glusterfs-6.0-2.el7rhgs.x86_64
glusterfs-fuse-6.0-2.el7rhgs.x86_64
glusterfs-geo-replication-6.0-2.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-libs-6.0-2.el7rhgs.x86_64
glusterfs-api-6.0-2.el7rhgs.x86_64
glusterfs-events-6.0-2.el7rhgs.x86_64
tendrl-gluster-integration-1.6.3-12.el7rhgs.noarch
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-server-6.0-2.el7rhgs.x86_64
glusterfs-client-xlators-6.0-2.el7rhgs.x86_64
glusterfs-rdma-6.0-2.el7rhgs.x86_64


NODE TO BE UPGRADED ==> [root at dhcp41-155]# rpm -qa | grep gluster
glusterfs-events-3.12.2-47.el7rhgs.x86_64
glusterfs-rdma-3.12.2-47.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.7.x86_64
glusterfs-cli-3.12.2-47.el7rhgs.x86_64
glusterfs-fuse-3.12.2-47.el7rhgs.x86_64
glusterfs-server-3.12.2-47.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-libs-3.12.2-47.el7rhgs.x86_64
glusterfs-3.12.2-47.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-47.el7rhgs.x86_64
tendrl-gluster-integration-1.6.3-12.el7rhgs.noarch
glusterfs-client-xlators-3.12.2-47.el7rhgs.x86_64
glusterfs-api-3.12.2-47.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
python2-gluster-3.12.2-47.el7rhgs.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
====================
In-service upgrade (geo-rep)

Actual results:
===============
While performing an in-service upgrade on a 3 node cluster, the third node
which was on a previous build (3.4.4), while running 'gluster v status' it
times out.

Expected results:
=================
'gluster v status' should not time out

Additional info:
===============
In-service upgrade was still in progress and hence cluster.op-version is still
on the previous op-version. 
The cluster is essentially in the middle of an in-service upgrade scenario.

--- Additional comment from Rochelle on 2019-05-07 13:00:30 IST ---

I've sent a mail with the access and credentials to my systems.

--- Additional comment from Rochelle on 2019-05-07 14:25:32 IST ---

pstack <gluster pid> outputs from each node:

Node 1: (Upgraded node)
-----------------------

[root at dhcp42-173 ~]# pstack 4457
Thread 9 (Thread 0x7f6ae9d10700 (LWP 4458)):
#0  0x00007f6af1578e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f6af27421c6 in gf_timer_proc (data=0x563bd3637ca0) at timer.c:194
#2  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f6ae950f700 (LWP 4459)):
#0  0x00007f6af1579361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000563bd286143b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2370
#2  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f6ae8d0e700 (LWP 4460)):
#0  0x00007f6af0dffe2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f6af0dffcc4 in sleep () from /lib64/libc.so.6
#2  0x00007f6af275f5bd in pool_sweeper (arg=<optimized out>) at mem-pool.c:473
#3  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f6ae850d700 (LWP 4461)):
#0  0x00007f6af1575d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f6af2774800 in syncenv_task (proc=proc at entry=0x563bd3638430) at
syncop.c:612
#2  0x00007f6af27756b0 in syncenv_processor (thdata=0x563bd3638430) at
syncop.c:679
#3  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f6ae7d0c700 (LWP 4462)):
#0  0x00007f6af1575d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f6af2774800 in syncenv_task (proc=proc at entry=0x563bd36387f0) at
syncop.c:612
#2  0x00007f6af27756b0 in syncenv_processor (thdata=0x563bd36387f0) at
syncop.c:679
#3  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f6ae750b700 (LWP 4463)):
#0  0x00007f6af0e2ff73 in select () from /lib64/libc.so.6
#1  0x00007f6af27b37f4 in runner (arg=0x563bd363c590) at
../../contrib/timer-wheel/timer-wheel.c:186
#2  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f6ae3b28700 (LWP 4654)):
#0  0x00007f6af1575965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f6ae6917f2b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:527
#2  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f6ae3327700 (LWP 4655)):
#0  0x00007f6af0e39483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f6af2799050 in event_dispatch_epoll_worker (data=0x563bd3759870) at
event-epoll.c:751
#2  0x00007f6af1571dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6af0e38ead in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f6af2c31780 (LWP 4457)):
#0  0x00007f6af1572f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f6af2798468 in event_dispatch_epoll (event_pool=0x563bd362f5b0) at
event-epoll.c:846
#2  0x0000563bd285d9b5 in main (argc=5, argv=<optimized out>) at
glusterfsd.c:2866



Node 2 : (upgraded node)
------------------------

[root at dhcp42-211 ~]# pstack 4493
Thread 9 (Thread 0x7f9f5c2c2700 (LWP 4495)):
#0  0x00007f9f63b2ae3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f9f64cf41c6 in gf_timer_proc (data=0x562895ff8ca0) at timer.c:194
#2  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f9f5bac1700 (LWP 4496)):
#0  0x00007f9f63b2b361 in sigwait () from /lib64/libpthread.so.0
#1  0x00005628944ba43b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2370
#2  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f9f5b2c0700 (LWP 4497)):
#0  0x00007f9f633b1e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f9f633b1cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f9f64d115bd in pool_sweeper (arg=<optimized out>) at mem-pool.c:473
#3  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f9f5aabf700 (LWP 4498)):
#0  0x00007f9f63b27d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f9f64d26800 in syncenv_task (proc=proc at entry=0x562895ff9430) at
syncop.c:612
#2  0x00007f9f64d276b0 in syncenv_processor (thdata=0x562895ff9430) at
syncop.c:679
#3  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f9f5a2be700 (LWP 4499)):
#0  0x00007f9f63b27d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f9f64d26800 in syncenv_task (proc=proc at entry=0x562895ff97f0) at
syncop.c:612
#2  0x00007f9f64d276b0 in syncenv_processor (thdata=0x562895ff97f0) at
syncop.c:679
#3  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f9f59abd700 (LWP 4500)):
#0  0x00007f9f633e1f73 in select () from /lib64/libc.so.6
#1  0x00007f9f64d657f4 in runner (arg=0x562895ffd590) at
../../contrib/timer-wheel/timer-wheel.c:186
#2  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f9f560da700 (LWP 4693)):
#0  0x00007f9f63b27965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f9f58ec9f2b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:527
#2  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f9f558d9700 (LWP 4694)):
#0  0x00007f9f633eb483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f9f64d4b050 in event_dispatch_epoll_worker (data=0x562896119ae0) at
event-epoll.c:751
#2  0x00007f9f63b23dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f9f633eaead in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f9f651e3780 (LWP 4493)):
#0  0x00007f9f63b24f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f9f64d4a468 in event_dispatch_epoll (event_pool=0x562895ff05b0) at
event-epoll.c:846
#2  0x00005628944b69b5 in main (argc=5, argv=<optimized out>) at
glusterfsd.c:2866



Node 3 (Node to be upgraded)
-----------------------------
[root at dhcp41-155 yum.repos.d]# pstack 3942
Thread 8 (Thread 0x7f468c4a2700 (LWP 3943)):
#0  0x00007f4693d0ae3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f4694ed5f86 in gf_timer_proc (data=0x556b28add800) at timer.c:174
#2  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f468bca1700 (LWP 3944)):
#0  0x00007f4693d0b361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000556b26f90c7b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2242
#2  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f468b4a0700 (LWP 3945)):
#0  0x00007f4693591e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f4693591cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f4694ef0b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f468ac9f700 (LWP 3946)):
#0  0x00007f4693d07d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f4694f03dd8 in syncenv_task (proc=proc at entry=0x556b28ade020) at
syncop.c:603
#2  0x00007f4694f04ca0 in syncenv_processor (thdata=0x556b28ade020) at
syncop.c:695
#3  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f468a49e700 (LWP 3947)):
#0  0x00007f4693d07d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f4694f03dd8 in syncenv_task (proc=proc at entry=0x556b28ade3e0) at
syncop.c:603
#2  0x00007f4694f04ca0 in syncenv_processor (thdata=0x556b28ade3e0) at
syncop.c:695
#3  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f46852f1700 (LWP 4320)):
#0  0x00007f4693d07965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f46898b783b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
#2  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f4684af0700 (LWP 4321)):
#0  0x00007f46935cb483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f4694f266e8 in event_dispatch_epoll_worker (data=0x556b28af8780) at
event-epoll.c:749
#2  0x00007f4693d03dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f46935caead in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f46953ad780 (LWP 3942)):
#0  0x00007f4693d04f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f4694f26f78 in event_dispatch_epoll (event_pool=0x556b28ad5a70) at
event-epoll.c:846
#2  0x0000556b26f8d538 in main (argc=5, argv=<optimized out>) at
glusterfsd.c:2692

--- Additional comment from Atin Mukherjee on 2019-05-07 14:42:05 IST ---

I don't see any threads being stuck. Until and unless this problem is
reproducible, nothing can be done here. Does the volume status still times out?
If so, a gdb session can be used and the problem can be debugged.

--- Additional comment from Rochelle on 2019-05-08 10:04:12 IST ---

Yes Atin, the issue still persists. 
I have sent the credentials of the systems in an email.

--- Additional comment from Rochelle on 2019-05-08 11:16:09 IST ---

Hit this issue again but differently. This time on RHGS 3.5 nodes

1. Ran the gdeploy conf file to create a geo-rep session. 
2. Once the run was successful, I ran a 'gluster v geo-rep status' on the 
   master node (to see if creation was successfull)and this timed out as shown
below:

[root at dhcp42-131 ~]# gluster v geo-rep status
Error : Request timed out
geo-replication command failed


3. When I ran the full (specific) ge-rep status, it worked as expected. 

[root at dhcp42-131 ~]# gluster v geo-replication master 10.70.42.250::slave
status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
     SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.131    master        /rhs/brick1/b1    root         
10.70.42.250::slave    10.70.42.250    Active     Changelog Crawl    2019-05-08
05:37:20          
10.70.42.131    master        /rhs/brick2/b4    root         
10.70.42.250::slave    10.70.42.250    Active     Changelog Crawl    2019-05-08
05:37:20          
10.70.42.131    master        /rhs/brick2/b7    root         
10.70.42.250::slave    10.70.42.250    Active     Changelog Crawl    2019-05-08
05:37:20          
10.70.42.255    master        /rhs/brick1/b3    root         
10.70.42.250::slave    10.70.43.245    Passive    N/A                N/A        
10.70.42.255    master        /rhs/brick2/b6    root         
10.70.42.250::slave    10.70.43.245    Passive    N/A                N/A        
10.70.42.255    master        /rhs/brick2/b9    root         
10.70.42.250::slave    10.70.43.245    Passive    N/A                N/A        
10.70.42.14     master        /rhs/brick1/b2    root         
10.70.42.250::slave    10.70.41.224    Passive    N/A                N/A        
10.70.42.14     master        /rhs/brick2/b5    root         
10.70.42.250::slave    10.70.41.224    Passive    N/A                N/A        
10.70.42.14     master        /rhs/brick2/b8    root         
10.70.42.250::slave    10.70.41.224    Passive    N/A                N/A

--- Additional comment from Bala Konda Reddy M on 2019-05-09 11:27:14 IST ---

Atin/Sanju,

While upgarding my cluster(6 nodes), I am able to hit the issue. Upgraded first
node and checked volume status and heal info working fine on upgraded node.
On node(2nd node) which is still in 3.4.4, gluster volume status is in hung
state.

Regards,
Bala


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1707246
[Bug 1707246] [glusterd]: While upgrading (3-node cluster) 'gluster v status'
times out on node to be upgraded
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list