[Bugs] [Bug 1460225] Not cleaning up stale socket file is resulting in spamming glusterd logs with warnings of "got disconnect from stale rpc"

bugzilla at redhat.com bugzilla at redhat.com
Fri Jun 9 12:23:50 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1460225



--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
Description of problem:
======================
On a brick mux setup, When a brick is killed and we end up start volume using
volume force, then the stale socket file results in glusterd spamming with
below message
[2017-06-08 13:36:09.699089] W
[glusterd-handler.c:5678:__glusterd_brick_rpc_notify] 0-management: got
disconnect from stale rpc on /rhs/brick31/test3_31

How reproducible:
=========
always

Steps to Reproduce:
1. have brick mux enabled, create 30 volumes say v1..v30
2. kill b1(let us say base volume for the glusterfsd was v1)
3. now do a vol start force of say v25(not the base volume v1)
4. Now do a vol start force of all the volumes say for i in $(gluster v
list);do gluster v start $i force;done
5. we can see that the glusterd log is spammed with below error here on for
indefinite time, as the base volume glusterfsd socket file is stale and
existing
(in my case the stale socket file is with base volume test3_31)

[2017-06-08 13:37:45.712948] W
[glusterd-handler.c:5678:__glusterd_brick_rpc_notify] 0-management: got
disconnect from stale rpc on /rhs/brick31/test3_31
[2017-06-08 13:37:48.713752] W
[glusterd-handler.c:5678:__glusterd_brick_rpc_notify] 0-management: got
disconnect from stale rpc on /rhs/brick31/test3_31
[2017-06-08 13:37:51.713870] W
[glusterd-handler.c:5678:__glusterd_brick_rpc_notify] 0-management: got
disconnect from stale rpc on /rhs/brick31/test3_31


Workaround
=======
delete the old stale socket file

RCA:

This only happens when the brick process was killed with SIGKILL, not SIGTERM.
Here given the brick process was killed with SIGKILL signal the signal handler
wasn't invoked and hence the further cleanup wasn't triggered due to which we
ended up with a stale socket file and this is the reason we see a constant
series of stale disconnect. I can actually convert the gf_log instance to
gf_log_occasionally to avoid this flood.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Pil9QMxgla&a=cc_unsubscribe


More information about the Bugs mailing list