[Gluster-users] BUG: After stop and start wrong port is advertised
    Mike Hulsman 
    mike.hulsman at proxy.nl
       
    Wed Nov  8 11:28:16 UTC 2017
    
    
  
Hi, 
This bug is hitting me hard on two different clients. 
In RHGS 3.3 and on glusterfs 3.10.2 on Centos 7.4 
in once case I had 59 differences in a total of 203 bricks. 
I wrote a quick and dirty script to check all ports against the brick file and the running process. 
#!/bin/bash 
Host=`uname -n| awk -F"." '{print $1}'` 
GlusterVol=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print $NF}'| awk -F"-server" '{print $1}'|sort | uniq` 
Port=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print $NF}'| awk -F"." '{print $NF}'` 
for Volumes in ${GlusterVol}; 
do 
cd /var/lib/glusterd/vols/${Volumes}/bricks 
Bricks=`ls ${Host}*` 
for Brick in ${Bricks}; 
do 
Onfile=`grep ^listen-port "${Brick}"` 
BrickDir=`echo "${Brick}"| awk -F":" '{print $2}'| cut -c2-` 
Daemon=`ps -eaf | grep "\${BrickDir}.pid" |grep -v grep | awk '{print $NF}' | awk -F"." '{print $2}'` 
#echo Onfile: ${Onfile} 
#echo Daemon: ${Daemon} 
if [ "${Onfile}" = "${Daemon}" ]; then 
echo "OK For ${Brick}" 
else 
echo "!!! NOT OK For ${Brick}" 
fi 
done 
done 
Met vriendelijke groet, 
Mike Hulsman 
Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open Source and Cloud Technology 
Delftweg 128 3043 NB Rotterdam The Netherlands | +31 10 307 0965 
From: "Jo Goossens" <jo.goossens at hosted-power.com> 
To: "Atin Mukherjee" <amukherj at redhat.com> 
Cc: gluster-users at gluster.org 
Sent: Friday, October 27, 2017 11:06:35 PM 
Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised 
BQ_BEGIN
RE: [Gluster-users] BUG: After stop and start wrong port is advertised 
Hello Atin, 
I just read it and very happy you found the issue. We really hope this will be fixed in the next 3.10.7 version! 
PS: Wow nice all that c code and those "goto out" statements (not always considered clean but the best way often I think). Can remember the days I wrote kernel drivers myself in c :) 
Regards 
Jo Goossens 
BQ_BEGIN
-----Original message----- 
From: Atin Mukherjee <amukherj at redhat.com> 
Sent: Fri 27-10-2017 21:01 
Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised 
To: Jo Goossens <jo.goossens at hosted-power.com>; 
CC: gluster-users at gluster.org; 
We (finally) figured out the root cause, Jo! 
Patch https://review.gluster.org/#/c/18579 posted upstream for review. 
On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < jo.goossens at hosted-power.com > wrote: 
BQ_BEGIN
Hi, 
We use glusterfs 3.10.5 on Debian 9. 
When we stop or restart the service, e.g.: service glusterfs-server restart 
We see that the wrong port get's advertised afterwards. For example: 
Before restart: 
Status of volume: public 
Gluster process TCP Port RDMA Port Online Pid 
------------------------------------------------------------------------------ 
Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 
Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 
Brick 192.168.140.43:/gluster/public 49152 0 Y 5913 
Self-heal Daemon on localhost N/A N/A Y 5932 
Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084 
Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499 
Task Status of Volume public 
------------------------------------------------------------------------------ 
There are no active volume tasks 
After restart of the service on one of the nodes (192.168.140.43) the port seems to have changed (but it didn't): 
root at app3:/var/log/glusterfs# gluster volume status 
Status of volume: public 
Gluster process TCP Port RDMA Port Online Pid 
------------------------------------------------------------------------------ 
Brick 192.168.140.41:/gluster/public 49153 0 Y 6364 
Brick 192.168.140.42:/gluster/public 49152 0 Y 1483 
Brick 192.168.140.43:/gluster/public 49154 0 Y 5913 
Self-heal Daemon on localhost N/A N/A Y 4628 
Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077 
Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777 
Task Status of Volume public 
------------------------------------------------------------------------------ 
There are no active volume tasks 
However the active process is STILL the same pid AND still listening on the old port 
root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster 
tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 5913/glusterfsd 
The other nodes logs fill up with errors because they can't reach the daemon anymore. They try to reach it on the "new" port instead of the old one: 
[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket 
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) 
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket 
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) 
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket 
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) 
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket 
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-public-client-2: changing port to 49154 (from 0) 
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish] 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection refused); disconnecting socket 
So they now try 49154 instead of the old 49152 
Is this also by design? We had a lot of issues because of this recently. We don't understand why it starts advertising a completely wrong port after stop/start. 
Regards 
Jo Goossens 
_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 
BQ_END
BQ_END
_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 
BQ_END
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171108/be3b55e2/attachment.html>
    
    
More information about the Gluster-users
mailing list