[Gluster-users] BUG: After stop and start wrong port is advertised
Jo Goossens
jo.goossens at hosted-power.com
Thu Sep 21 08:38:32 UTC 2017
Hi,
We use glusterfs 3.10.5 on Debian 9.
When we stop or restart the service, e.g.: service glusterfs-server restart
We see that the wrong port get's advertised afterwards. For example:
Before restart:
Status of volume: public
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public 49153 0 Y 6364
Brick 192.168.140.42:/gluster/public 49152 0 Y 1483
Brick 192.168.140.43:/gluster/public 49152 0 Y 5913
Self-heal Daemon on localhost N/A N/A Y 5932
Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084
Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
After restart of the service on one of the nodes (192.168.140.43) the port seems to have changed (but it didn't):
root at app3:/var/log/glusterfs# gluster volume status
Status of volume: public
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public 49153 0 Y 6364
Brick 192.168.140.42:/gluster/public 49152 0 Y 1483
Brick 192.168.140.43:/gluster/public 49154 0 Y 5913
Self-heal Daemon on localhost N/A N/A Y 4628
Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077
Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
However the active process is STILL the same pid AND still listening on the old port
root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 5913/glusterfsd
The other nodes logs fill up with errors because they can't reach the daemon anymore. They try to reach it on the "new" port instead of the old one:
[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket
So they now try 49154 instead of the old 49152
Is this also by design? We had a lot of issues because of this recently. We don't understand why it starts advertising a completely wrong port after stop/start.
Regards
Jo Goossens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170921/6d1542f3/attachment.html>
More information about the Gluster-users
mailing list