[Gluster-users] BUG: After stop and start wrong port is advertised

Thu Nov 9 07:17:09 UTC 2017

Thanks. good to know. 

Met vriendelijke groet, 

Mike Hulsman 

Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open Source and Cloud Technology 
Delftweg 128 3043 NB Rotterdam The Netherlands | +31 10 307 0965 

> From: "Atin Mukherjee" <amukherj at redhat.com>
> To: "Mike Hulsman" <mike.hulsman at proxy.nl>
> Cc: "Jo Goossens" <jo.goossens at hosted-power.com>, "gluster-users"
> <gluster-users at gluster.org>
> Sent: Wednesday, November 8, 2017 2:12:02 PM
> Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised

> We've a fix in release-3.10 branch which is merged and should be available in
> the next 3.10 update.

> On Wed, Nov 8, 2017 at 4:58 PM, Mike Hulsman < mike.hulsman at proxy.nl > wrote:

>> Hi,

>> This bug is hitting me hard on two different clients.
>> In RHGS 3.3 and on glusterfs 3.10.2 on Centos 7.4
>> in once case I had 59 differences in a total of 203 bricks.

>> I wrote a quick and dirty script to check all ports against the brick file and
>> the running process.
>> #!/bin/bash

>> Host=`uname -n| awk -F"." '{print $1}'`
>> GlusterVol=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print
>> $NF}'| awk -F"-server" '{print $1}'|sort | uniq`
>> Port=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print $NF}'| awk
>> -F"." '{print $NF}'`

>> for Volumes in ${GlusterVol};
>> do
>> cd /var/lib/glusterd/vols/${Volumes}/bricks
>> Bricks=`ls ${Host}*`
>> for Brick in ${Bricks};
>> do
>> Onfile=`grep ^listen-port "${Brick}"`
>> BrickDir=`echo "${Brick}"| awk -F":" '{print $2}'| cut -c2-`
>> Daemon=`ps -eaf | grep "\${BrickDir}.pid" |grep -v grep | awk '{print $NF}' |
>> awk -F"." '{print $2}'`
>> #echo Onfile: ${Onfile}
>> #echo Daemon: ${Daemon}
>> if [ "${Onfile}" = "${Daemon}" ]; then
>> echo "OK For ${Brick}"
>> else
>> echo "!!! NOT OK For ${Brick}"
>> fi
>> done
>> done

>> Met vriendelijke groet,

>> Mike Hulsman

>> Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open Source
>> and Cloud Technology
>> Delftweg 128 3043 NB Rotterdam The Netherlands | +31 10 307 0965

>>> From: "Jo Goossens" < jo.goossens at hosted-power.com >
>>> To: "Atin Mukherjee" < amukherj at redhat.com >
>>> Cc: gluster-users at gluster.org
>>> Sent: Friday, October 27, 2017 11:06:35 PM
>>> Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised

>>> RE: [Gluster-users] BUG: After stop and start wrong port is advertised

>>> Hello Atin,

>>> I just read it and very happy you found the issue. We really hope this will be
>>> fixed in the next 3.10.7 version!

>>> PS: Wow nice all that c code and those "goto out" statements (not always
>>> considered clean but the best way often I think). Can remember the days I wrote
>>> kernel drivers myself in c :)

>>> Regards

>>> Jo Goossens

>>>> -----Original message-----
>>>> From: Atin Mukherjee < amukherj at redhat.com >
>>>> Sent: Fri 27-10-2017 21:01
>>>> Subject: Re: [Gluster-users] BUG: After stop and start wrong port is advertised
>>>> To: Jo Goossens < jo.goossens at hosted-power.com >;
>>>> CC: gluster-users at gluster.org ;
>>>> We (finally) figured out the root cause, Jo!
>>>> Patch https://review.gluster.org/#/c/18579 posted upstream for review.

>>>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens < jo.goossens at hosted-power.com >
>>>> wrote:

>>>>> Hi,

>>>>> We use glusterfs 3.10.5 on Debian 9.

>>>>> When we stop or restart the service, e.g.: service glusterfs-server restart

>>>>> We see that the wrong port get's advertised afterwards. For example:

>>>>> Before restart:

>>>>> Status of volume: public
>>>>> Gluster process TCP Port RDMA Port Online Pid
>>>>> ------------------------------------------------------------------------------
>>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y 6364
>>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y 1483
>>>>> Brick 192.168.140.43:/gluster/public 49152 0 Y 5913
>>>>> Self-heal Daemon on localhost N/A N/A Y 5932
>>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y 13084
>>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y 15499
>>>>> Task Status of Volume public
>>>>> ------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>> After restart of the service on one of the nodes (192.168.140.43) the port seems
>>>>> to have changed (but it didn't):
>>>>> root at app3:/var/log/glusterfs# gluster volume status
>>>>> Status of volume: public
>>>>> Gluster process TCP Port RDMA Port Online Pid
>>>>> ------------------------------------------------------------------------------
>>>>> Brick 192.168.140.41:/gluster/public 49153 0 Y 6364
>>>>> Brick 192.168.140.42:/gluster/public 49152 0 Y 1483
>>>>> Brick 192.168.140.43:/gluster/public 49154 0 Y 5913
>>>>> Self-heal Daemon on localhost N/A N/A Y 4628
>>>>> Self-heal Daemon on 192.168.140.42 N/A N/A Y 3077
>>>>> Self-heal Daemon on 192.168.140.41 N/A N/A Y 28777
>>>>> Task Status of Volume public
>>>>> ------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>> However the active process is STILL the same pid AND still listening on the old
>>>>> port
>>>>> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
>>>>> tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 5913/glusterfsd
>>>>> The other nodes logs fill up with errors because they can't reach the daemon
>>>>> anymore. They try to reach it on the "new" port instead of the old one:
>>>>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
>>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>>>>> refused); disconnecting socket
>>>>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
>>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>>>>> refused); disconnecting socket
>>>>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
>>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>>>>> refused); disconnecting socket
>>>>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
>>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>>>>> refused); disconnecting socket
>>>>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
>>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
>>>>> refused); disconnecting socket
>>>>> So they now try 49154 instead of the old 49152
>>>>> Is this also by design? We had a lot of issues because of this recently. We
>>>>> don't understand why it starts advertising a completely wrong port after
>>>>> stop/start.

>>>>> Regards

>>>>> Jo Goossens

>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users

>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171109/81779dbe/attachment.html>