[Bugs] [Bug 1264290] New: glusterd lost track of brick port numbers after brick daemon dies
bugzilla at redhat.com
bugzilla at redhat.com
Fri Sep 18 06:50:22 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1264290
Bug ID: 1264290
Summary: glusterd lost track of brick port numbers after brick
daemon dies
Product: GlusterFS
Version: mainline
Component: glusterd
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
eivind at pacbell.net, gluster-bugs at redhat.com
Depends On: 1264245
+++ This bug was initially created as a clone of Bug #1264245 +++
Description of problem:
After a brick daemon dies, glusterd lost track of new/future brick listen
ports.
Two different error scenarios can happen:
a) A replacement brick from the same node where a brick daemon previously died
will not be healed.
b) A new volume created using a brick from same server where a brick daemon
previously died will not be replicated (by the client)
Version-Release number of selected component (if applicable):
3.7.4
How reproducible:
Every time.
Steps to Reproduce (both scenario a+b):
1a. Create a distributed-replicated 1x2 volume
2a. kill -9 <brick-pid>
3a. stop + delete volume
4a. replace-brick with another brick on same node where <brick-pid> died
(healing works)
5a. kill -9 <replacement-brick-pid>
6a. replace-brick with yet another brick (healing fails because wrong pid is
used to connect to new brick)
7a. grep "Connection refused" /var/log/glusterfs/glustershd.log
1b. Create a distributed-replicated 1x2 volume
2b. kill -9 <brick-pid>
3b. stop + delete volume
4b. Create new 1x2 volume using same (cleaned) bricks as in 1b
5b. mount it.
6b. On client, grep "Connection refused" /var/log/glusterfs/<volname>.log
Actual results:
a. # grep "Connection refused" /var/log/glusterfs/glustershd.log
[2015-09-18 00:55:24.717023] E [socket.c:2278:socket_connect_finish]
0-voltest-client-0: connection to 192.168.1.3:49152 failed (Connection refused)
b. # grep "Connection refused" /var/log/glusterfs/voltest.log
[2015-09-18 00:44:59.117344] E [socket.c:2278:socket_connect_finish]
4-voltest-client-0: connection to 192.168.1.3:49152 failed (Connection refused)
Expected results:
Additional info:
Restarting glusterd after the brick daemon is killed will prevent the
"Connection refused" in both a) and b)
--- Additional comment from Atin Mukherjee on 2015-09-17 23:45:47 EDT ---
Request AFR team to check this.
--- Additional comment from Atin Mukherjee on 2015-09-18 00:19:31 EDT ---
Scenario b is reproducible. We will keep you posted once we have the RCA.
Thanks for filing the bug.
--- Additional comment from Vijay Bellur on 2015-09-18 01:23:39 EDT ---
REVIEW: http://review.gluster.org/12189 (glusterd: Use GF_PMAP_PORT_BRICKSERVER
in pmap_registry_remove from brick disconnects) posted (#1) for review on
master by Atin Mukherjee (amukherj at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1264245
[Bug 1264245] glusterd lost track of brick port numbers after brick daemon
dies
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list