<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html>
<head>
<meta name="Generator" content="Zarafa WebAccess v7.1.14-51822">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>BUG: After stop and start wrong port is advertised</title>
<style type="text/css">
body
{
font-family: Arial, Verdana, Sans-Serif ! important;
font-size: 12px;
padding: 5px 5px 5px 5px;
margin: 0px;
border-style: none;
background-color: #ffffff;
}
p, ul, li
{
margin-top: 0px;
margin-bottom: 0px;
}
</style>
</head>
<body>
<p>Hi,</p><p> </p><p> </p><p>We use glusterfs 3.10.5 on Debian 9.</p><p> </p><p>When we stop or restart the service, e.g.: service glusterfs-server restart</p><p> </p><p>We see that the wrong port get's advertised afterwards. For example:</p><p> </p><p>Before restart:</p><p> </p><div>Status of volume: public</div><div>Gluster process TCP Port RDMA Port Online Pid</div><div>------------------------------------------------------------------------------</div><div>Brick 192.168.140.41:/gluster/public 49153 0 Y 6364</div><div>Brick 192.168.140.42:/gluster/public 49152 0 Y 1483</div><div>Brick 192.168.140.43:/gluster/public 49152 0 Y 5913</div><div>Self-heal Daemon on localhost N/A N/A Y 5932</div><div>Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084</div><div>Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499</div><div> </div><div>Task Status of Volume public</div><div>------------------------------------------------------------------------------</div><div>There are no active volume tasks</div><div> </div><div> </div><div>After restart of the service on one of the nodes (192.168.140.43) the port seems to have changed (but it didn't):</div><div> </div><div><div>root@app3:/var/log/glusterfs# gluster volume status</div><div>Status of volume: public</div><div>Gluster process TCP Port RDMA Port Online Pid</div><div>------------------------------------------------------------------------------</div><div>Brick 192.168.140.41:/gluster/public 49153 0 Y 6364</div><div>Brick 192.168.140.42:/gluster/public 49152 0 Y 1483</div><div>Brick 192.168.140.43:/gluster/public 49154 0 Y 5913</div><div>Self-heal Daemon on localhost N/A N/A Y 4628</div><div>Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077</div><div>Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777</div><div> </div><div>Task Status of Volume public</div><div>------------------------------------------------------------------------------</div><div>There are no active volume tasks</div><div> </div></div><div> </div><div>However the active process is STILL the same pid AND still listening on the old port</div><div> </div><div><div>root@192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster</div><div>tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 5913/glusterfsd</div><div> </div></div><div> </div><div>The other nodes logs fill up with errors because they can't reach the daemon anymore. They try to reach it on the "new" port instead of the old one:</div><div> </div><div><div>[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket</div><div>[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)</div><div>[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket</div><div>[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)</div><div>[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket</div><div>[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)</div><div>[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket</div><div>[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0)</div><div>[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket</div><div> </div></div><div>So they now try 49154 instead of the old 49152 </div><div> </div><div>Is this also by design? We had a lot of issues because of this recently. We don't understand why it starts advertising a completely wrong port after stop/start.</div><div> </div><div> </div><div> </div><div> </div><p> </p><p>Regards</p><p>Jo Goossens</p><p> </p>
</body>
</html>