[Bugs] [Bug 1501154] New: Brick Multiplexing: Gluster volume start force complains with command "Error : Request timed out" when there are multiple volumes

Thu Oct 12 07:23:54 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1501154

            Bug ID: 1501154
           Summary: Brick Multiplexing: Gluster volume start force
                    complains with command "Error : Request timed out"
                    when there are multiple volumes
           Product: GlusterFS
           Version: 3.12
         Component: glusterd
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: srakonde at redhat.com
                CC: amukherj at redhat.com, bmekala at redhat.com,
                    bugs at gluster.org, nchilaka at redhat.com,
                    rhs-bugs at redhat.com, sbairagy at redhat.com,
                    srakonde at redhat.com, storage-qa-internal at redhat.com,
                    vbellur at redhat.com

+++ This bug was initially created as a clone of Bug #1499509 +++

+++ This bug was initially created as a clone of Bug #1459895 +++

Description of problem:
=====================
On a setup with hosting many similar volumes of same type, after a brick is
brought down , a vol start force of the higher ascending order, will result in
the command to "Error : Request timed out"

Version-Release number of selected component (if applicable):
=======
3.8.4-27

How reproducible:
=========
always

Steps to Reproduce:
==================
1.enabled brick mux, create about 25 volumes all 1x3  say
test3_1,test3_2...test3_25
2.now kill the brick pid of b1 using kill -9
3.now start a volume using force, say vol start force test3_22(it must be in
the later order of vol list)
4. It can  be noticed the command throws "Error : Request timed out"

Also the glusterd log shows  cleaning of dead brick during this command of vol
start force, hence could be the reason for the delay

(note: my volumes are like test3_{31..61})

[2017-06-08 13:11:28.576259] I [MSGID: 106499]
[glusterd-handler.c:4369:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume test3_47
[2017-06-08 13:11:32.144875] I [MSGID: 106062]
[glusterd-volume-ops.c:2617:glusterd_op_start_volume] 0-management: Global dict
not present.
[2017-06-08 13:11:47.148857] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick32/test3_32
[2017-06-08 13:12:02.153117] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick33/test3_33
[2017-06-08 13:12:17.157265] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick34/test3_34
[2017-06-08 13:12:32.161184] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick35/test3_35
[2017-06-08 13:12:47.164727] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick36/test3_36
[2017-06-08 13:13:02.168547] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick37/test3_37
[2017-06-08 13:13:17.172140] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick38/test3_38
[2017-06-08 13:13:32.175261] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick39/test3_39
[2017-06-08 13:13:47.178395] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick40/test3_40
[2017-06-08 13:14:02.181805] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick41/test3_41
[2017-06-08 13:14:17.186312] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick42/test3_42
[2017-06-08 13:14:32.189988] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick43/test3_43
[2017-06-08 13:14:47.193370] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick44/test3_44
[2017-06-08 13:15:02.197284] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick45/test3_45
[2017-06-08 13:15:17.200764] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick46/test3_46
[2017-06-08 13:15:32.204242] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick48/test3_48
[2017-06-08 13:15:47.207613] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick49/test3_49
[2017-06-08 13:16:02.211184] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick50/test3_50
[2017-06-08 13:16:17.215043] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick51/test3_51
[2017-06-08 13:16:32.218976] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick60/test3_60
[2017-06-08 13:16:47.222271] I [glusterd-utils.c:5229:find_compat_brick_in_vol]
0-management: cleaning up dead brick 10.70.35.45:/rhs/brick61/test3_61
[2017-06-08 13:16:47.222872] I [MSGID: 106144]
[glusterd-pmap.c:377:pmap_registry_remove] 0-pmap: removing brick
/rhs/brick47/test3_47 on port 49152
[2017-06-08 13:16:47.228031] I [socket.c:2465:socket_event_handler]
0-transport: EPOLLERR - dis

Actual results:

Expected results:

Additional info:

--- Additional comment from Atin Mukherjee on 2017-06-09 02:24:22 EDT ---

This only happens when the brick process was killed with SIGKILL, not SIGTERM.
My guess is given the brick process was killed with SIGKILL signal the signal
handler wasn't invoked and hence the further cleanup wasn't triggered due to
which we ended up with a stale socket file and this is the reason we see a
constant series of stale disconnect (filed as a separate issue 1459900). Now
looking at the process stack below thread 8 captures the stale disconnect where
as thread 8 tries to attach to a brick which is again wrong as we should have
spawned a brick given there are no bricks running. But this happened because
glusterd didn't receive pmap_signout event as the brick was killed abruptly and
for multiplexed bricks glusterd doesn't receive disconnect events the brickinfo
status was never set to stopped.

Thread 8 (Thread 0x7f435db30700 (LWP 2934)):
#0  0x00007f4365ae1bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f4366cb5e13 in __synclock_lock (lock=lock at entry=0x7f4366fdd7f8) at
syncop.c:927
#2  0x00007f4366cb89d6 in synclock_lock (lock=lock at entry=0x7f4366fdd7f8) at
syncop.c:953
#3  0x00007f4361c4967b in glusterd_big_locked_notify (rpc=0x7f4358062f90,
mydata=0x7f435800e6f0, 
    event=RPC_CLNT_DISCONNECT, data=0x0, notify_fn=0x7f4361c45f90
<__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f4366a4cd2b in rpc_clnt_handle_disconnect (conn=0x7f4358062fc0,
clnt=0x7f4358062f90) at rpc-clnt.c:876
#5  rpc_clnt_notify (trans=<optimized out>, mydata=0x7f4358062fc0,
event=RPC_TRANSPORT_DISCONNECT, 
    data=<optimized out>) at rpc-clnt.c:939
#6  0x00007f4366a49433 in rpc_transport_notify (this=this at entry=0x7f43581610a0, 
    event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x7f43581610a0)
at rpc-transport.c:538
#7  0x00007f435f52942f in socket_event_poll_err (idx=<optimized out>,
gen=<optimized out>, this=0x7f43581610a0)
    at socket.c:1206
#8  socket_event_handler (fd=6, idx=<optimized out>, gen=<optimized out>,
data=0x7f43581610a0, 
    poll_in=<optimized out>, poll_out=4, poll_err=16) at socket.c:2476
#9  0x00007f4366cd4a9d in event_dispatch_epoll_handler (event=0x7f435db2fe90,
event_pool=0x13c7080)
    at event-epoll.c:572
#10 event_dispatch_epoll_worker (data=0x141f850) at event-epoll.c:648
#11 0x00007f4365adc5ba in start_thread () from /lib64/libpthread.so.0
#12 0x00007f43653b57cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f435e331700 (LWP 2933)):
#0  0x00007f4365ae1bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f4361cf0deb in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
#2  0x00007f4365adc5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f43653b57cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f43627ad700 (LWP 2805)):
#0  0x00007f436537b14d in nanosleep () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
#1  0x00007f436537b09a in sleep () from /lib64/libc.so.6
#2  0x00007f4361c689ef in find_compat_brick_in_vol
(conf=conf at entry=0x7f4366fb4050, 
    srch_vol=srch_vol at entry=0x7f435801ab10,
comp_vol=comp_vol at entry=0x7f4358133a30, 
    brickinfo=brickinfo at entry=0x7f4358144230) at glusterd-utils.c:5236
#3  0x00007f4361c72e33 in find_compatible_brick (other_vol_p=<synthetic
pointer>, brickinfo=0x7f4358144230, 
    volinfo=0x7f4358133a30, conf=0x7f4366fb4050) at glusterd-utils.c:5302
#4  glusterd_brick_start (volinfo=volinfo at entry=0x7f4358133a30,
brickinfo=brickinfo at entry=0x7f4358144230, 
    wait=wait at entry=_gf_true) at glusterd-utils.c:5486
#5  0x00007f4361cdfce1 in glusterd_start_volume (volinfo=0x7f4358133a30,
flags=<optimized out>, 
    wait=wait at entry=_gf_true) at glusterd-volume-ops.c:2500
#6  0x00007f4361ce00b6 in glusterd_op_start_volume
(dict=dict at entry=0x7f434c1810d0, 
    op_errstr=op_errstr at entry=0x7f4350204e08) at glusterd-volume-ops.c:2604
#7  0x00007f4361d16a7b in gd_mgmt_v3_commit_fn (op=op at entry=GD_OP_START_VOLUME,
dict=dict at entry=0x7f434c1810d0, 
    op_errstr=op_errstr at entry=0x7f4350204e08,
op_errno=op_errno at entry=0x7f4350204dfc, 
    rsp_dict=rsp_dict at entry=0x7f434c092960) at glusterd-mgmt.c:319
#8  0x00007f4361d19854 in glusterd_mgmt_v3_commit
(op=op at entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx at entry=0x7f434c180570, req_dict=0x7f434c1810d0,
op_errstr=op_errstr at entry=0x7f4350204e08, 
    op_errno=op_errno at entry=0x7f4350204dfc,
txn_generation=txn_generation at entry=0) at glusterd-mgmt.c:1524
#9  0x00007f4361d1b3c0 in glusterd_mgmt_v3_initiate_all_phases
(req=req at entry=0x7f43500031d0, 
    op=op at entry=GD_OP_START_VOLUME, dict=0x7f434c180570) at
glusterd-mgmt.c:2141
#10 0x00007f4361cd9362 in __glusterd_handle_cli_start_volume
(req=req at entry=0x7f43500031d0)
    at glusterd-volume-ops.c:551
#11 0x00007f4361c496ed in glusterd_big_locked_handler (req=0x7f43500031d0, 
    actor_fn=0x7f4361cd9100 <__glusterd_handle_cli_start_volume>) at
glusterd-handler.c:81
#12 0x00007f4366cb5d20 in synctask_wrap () at syncop.c:375
#13 0x00007f43652fb2c0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

We'd need to see how we can handle the disconnect part better for multiplexed
bricks so that this scenario can be handled, but certainly not in this release
as per our agreement.

--- Additional comment from Atin Mukherjee on 2017-08-17 06:45:02 EDT ---

The time out is seen because of all the brickinfo's are still marked in started
state and while finding a compatible brick, we're waiting for one of the brick
process to come up which is never the case here as we are just starting a
single volume and rest all bricks are down. This is because when a brick
process goes down the brick for which the process came up for the first time
gets a rpc disconnect notification and the brickinfo->status is marked as
stopped, however for the other bricks which were attached to the same process
continue to remain in the started state.

While processing the disconnect event, if we can identify the list of bricks
which are attached to the same process (probably through
glusterd_brick_proc_t?) and mark the respective bricks as stopped, this issue
can be avoided.

Samikshan - can this be implemented?

--- Additional comment from Samikshan Bairagya on 2017-09-04 07:13:09 EDT ---

(In reply to Atin Mukherjee from comment #4)
> The time out is seen because of all the brickinfo's are still marked in
> started state and while finding a compatible brick, we're waiting for one of
> the brick process to come up which is never the case here as we are just
> starting a single volume and rest all bricks are down. This is because when
> a brick process goes down the brick for which the process came up for the
> first time gets a rpc disconnect notification and the brickinfo->status is
> marked as stopped, however for the other bricks which were attached to the
> same process continue to remain in the started state.
> 
> While processing the disconnect event, if we can identify the list of bricks
> which are attached to the same process (probably through
> glusterd_brick_proc_t?) and mark the respective bricks as stopped, this
> issue can be avoided.
> 
> Samikshan - can this be implemented?

This can be implemented.

--- Additional comment from Worker Ant on 2017-10-08 09:31:17 EDT ---

REVIEW: https://review.gluster.org/18444 (glusterd:disconnect event in brick
multiplexing) posted (#4) for review on master by Sanju Rakonde
(srakonde at redhat.com)

--- Additional comment from Worker Ant on 2017-10-09 03:39:24 EDT ---

REVIEW: https://review.gluster.org/18444 (glusterd:Marking all the brick status
as stopped when a process goes down in brick multiplexing) posted (#5) for
review on master by Sanju Rakonde (srakonde at redhat.com)

--- Additional comment from Worker Ant on 2017-10-09 14:54:16 EDT ---

REVIEW: https://review.gluster.org/18444 (glusterd:Marking all the brick status
as stopped when a process goes down in brick multiplexing) posted (#6) for
review on master by Sanju Rakonde (srakonde at redhat.com)

--- Additional comment from Worker Ant on 2017-10-11 01:12:58 EDT ---

REVIEW: https://review.gluster.org/18444 (glusterd:Marking all the brick status
as stopped when a process goes down in brick multiplexing) posted (#7) for
review on master by Sanju Rakonde (srakonde at redhat.com)

--- Additional comment from Worker Ant on 2017-10-11 15:36:34 EDT ---

REVIEW: https://review.gluster.org/18444 (glusterd:Marking all the brick status
as stopped when a process goes down in brick multiplexing) posted (#8) for
review on master by Sanju Rakonde (srakonde at redhat.com)

--- Additional comment from Worker Ant on 2017-10-12 00:52:05 EDT ---

COMMIT: https://review.gluster.org/18444 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit 9422446d72bc054962d72ace9912ecb885946d49
Author: Sanju Rakonde <srakonde at redhat.com>
Date:   Sat Oct 7 03:33:40 2017 +0530

    glusterd:Marking all the brick status as stopped when a process goes down
in brick multiplexing

    In brick multiplexing environment, if a brick process goes down
    i.e., if we kill it with SIGKILL, the status of the brick for which
    the process came up for the first time is only changing to stopped.
    all other brick statuses are remain started. This is happening because
    the process was killed abruptly using SIGKILL signal and signal
    handler wasn't invoked and further cleanup wasn't triggered.

    When we try to start a volume using force, it shows error saying
    "Request timed out", since all the brickinfo->status are still in
    started state, we're waiting for one of the brick process to come up
    which never going to happen since the brick process was killed.

    To resolve this, In the disconnect event, We are checking all the
    processes that whether the brick which got disconnected belongs the
    process. Once we get the process we are calling a function named
    glusterd_mark_bricks_stopped_by_proc() and sending brick_proc_t object as
    an argument.

    From the glusterd_brick_proc_t we can get all the bricks attached
    to that process. but these are duplicated ones. To get the original
    brickinfo we are reading volinfo from brick. In volinfo we will have
    original brickinfo copies. We are changing brickinfo->status to
    stopped for all the bricks.

    Change-Id: Ifb9054b3ee081ef56b39b2903ae686984fe827e7
    BUG: 1499509
    Signed-off-by: Sanju Rakonde <srakonde at redhat.com>

--- Additional comment from Worker Ant on 2017-10-12 03:02:41 EDT ---

REVIEW: https://review.gluster.org/18499 (glusterd:Marking all the brick status
as stopped when a process goes down in brick multiplexing) posted (#2) for
review on release-3.12 by Sanju Rakonde (srakonde at redhat.com)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.