[Bugs] [Bug 1543711] New: glustershd/ glusterd is not using right port when connecting to glusterfsd process
bugzilla at redhat.com
bugzilla at redhat.com
Fri Feb 9 04:08:37 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1543711
Bug ID: 1543711
Summary: glustershd/glusterd is not using right port when
connecting to glusterfsd process
Product: GlusterFS
Version: 4.0
Component: glusterd
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: bugs at gluster.org, zz.sh.cynthia at gmail.com
Depends On: 1537362
Blocks: 1537346
+++ This bug was initially created as a clone of Bug #1537362 +++
+++ This bug was initially created as a clone of Bug #1537346 +++
Description of problem:
sometimes after reboot one sn nodes
The output of command “gluster v heal mstate info” shows
[root at testsn-1:/var/log/glusterfs/bricks]
# gluster v heal mstate info
Brick testsn-0.local:/mnt/bricks/mstate/brick
/testas-0/var/lib/ntp/drift
/testas-2/var/lib/ntp/drift
/.install-done
/testas-0/var/lib/ntp
/testmn-1/var/lib/ntp
/testas-2/var/lib/ntp
/testmn-0/var/lib/ntp
/testmn-1/var/lib/ntp/drift
/testas-1/var/lib/ntp
/testas-1/var/lib/ntp/drift
/testmn-0/var/lib/ntp/drift
Status: Connected
Number of entries: 11
Brick testsn-1.local:/mnt/bricks/mstate/brick
Status: Transport endpoint is not connected
Number of entries: -
glustershd can not connect to local brick process! when i check the glustershd
process i find it always fail when trying to connect to glusterfsd process
with port 49155.
[2018-01-18 10:42:29.891811] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-mstate-client-1: changing port to 49155 (from 0)
[2018-01-18 10:42:29.892120] E [socket.c:2369:socket_connect_finish]
0-mstate-client-1: connection to 192.168.1.3:49155 failed (Connection refused);
disconnecting
however, from local mstate glusterfsd process, it is listenning on port 49153!
Version-Release number of selected component (if applicable):
glusterfs3.12.3
How reproducible:
reboot sn node
Steps to Reproduce:
1.reboot sn node
2.
3.
Actual results:
glustershd can not connected to one local glusterfsd brick process
this can be seen from the following netstat command output;
[root at testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep glustershd
root 1295 1 0 Jan18 ? 00:00:18 /usr/sbin/glusterfs -s
testsn-1.local --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket --xlator-option
*replicate*.node-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8
root 19900 27911 0 04:10 pts/1 00:00:00 grep glustershd
[root at testsn-1:/var/log/glusterfs/bricks]
# netstat -p | grep 1295
tcp 0 0 testsn-1.local:49098 testsn-0.local:49154 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49099 testsn-0.local:49152 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49140 testsn-1.local:24007 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49097 testsn-0.local:49153 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49096 testsn-0.local:49155 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49120 testsn-1.local:49156 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49121 testsn-1.local:49152 ESTABLISHED
1295/glusterfs
tcp 0 0 testsn-1.local:49126 testsn-1.local:49154 ESTABLISHED
1295/glusterfs
unix 3 [ ] STREAM CONNECTED 36264 1295/glusterfs
/var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket
unix 2 [ ] DGRAM 36258 1295/glusterfs
Expected results:
glustershd should be able to connected to local brick process
Additional info:
# gluster v status mstate
Status of volume: mstate
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick testsn-0.local:/mnt/bricks/mstate/bri
ck 49154 0 Y 1113
Brick testsn-1.local:/mnt/bricks/mstate/bri
ck 49155 0 Y 1117
Self-heal Daemon on localhost N/A N/A Y 1295
Self-heal Daemon on testsn-2.local N/A N/A Y 1813
Self-heal Daemon on testsn-0.local N/A N/A Y 1135
Task Status of Volume mstate
------------------------------------------------------------------------------
There are no active volume tasks
It is quite strange that the mstate brick process listen port is showed as
49155 in “gluster v heal status mstate” but showed 49153 in ps command!
[root at testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep -i glusterfsd | grep mstate
root 1117 1 0 Jan18 ? 00:00:05 /usr/sbin/glusterfsd -s
testsn-1.local --volfile-id mstate.testsn-1.local.mnt-bricks-mstate-brick -p
/var/run/gluster/vols/mstate/testsn-1.local-mnt-bricks-mstate-brick.pid -S
/var/run/gluster/b520b934b415e6a68776cc4852901a77.socket --brick-name
/mnt/bricks/mstate/brick -l
/var/log/glusterfs/bricks/mnt-bricks-mstate-brick.log --xlator-option
*-posix.glusterd-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8 --brick-port 49153
--xlator-option mstate-server.listen-port=49153 --xlator-option
transport.socket.bind-address=testsn-1.local
--- Additional comment from Worker Ant on 2018-01-22 21:29:06 EST ---
REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only
when port is marked as free) posted (#3) for review on master by Atin Mukherjee
--- Additional comment from Worker Ant on 2018-01-22 21:45:23 EST ---
REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only
when port is marked as free) posted (#4) for review on master by Atin Mukherjee
--- Additional comment from Worker Ant on 2018-01-25 03:01:51 EST ---
COMMIT: https://review.gluster.org/19263 committed in master by \"Atin
Mukherjee\" <amukherj at redhat.com> with a commit message- glusterd: process pmap
sign in only when port is marked as free
Because of some crazy race in volume start code path because of friend
handshaking with volumes with quorum enabled we might end up into a situation
where glusterd would start a brick and get a disconnect and then immediately
try
to start the same brick instance based on another friend update request. And
then if for the very first brick even if the process doesn't come up at the end
sign in event gets sent and we end up having two duplicate portmap entries for
the same brick. Since in brick start we mark the previous port as free, its
better to consider a sign in request as no op if the corresponding port type is
marked as free.
Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32
BUG: 1537362
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1537346
[Bug 1537346] glustershd/glusterd is not using right port when connecting
to glusterfsd process
https://bugzilla.redhat.com/show_bug.cgi?id=1537362
[Bug 1537362] glustershd/glusterd is not using right port when connecting
to glusterfsd process
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list