[Gluster-users] BUG: After stop and start wrong port is advertised

Atin Mukherjee amukherj at redhat.com
Wed Nov 8 13:12:02 UTC 2017


We've a fix in release-3.10 branch which is merged and should be available
in the next 3.10 update.

On Wed, Nov 8, 2017 at 4:58 PM, Mike Hulsman <mike.hulsman at proxy.nl> wrote:

> Hi,
>
> This bug is hitting me hard on two different clients.
> In RHGS 3.3 and on glusterfs 3.10.2 on Centos 7.4
> in once case I had 59 differences in a total of 203 bricks.
>
> I wrote a quick and dirty script to check all ports against the brick file
> and the running process.
> #!/bin/bash
>
> Host=`uname -n| awk -F"." '{print $1}'`
> GlusterVol=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk
> '{print $NF}'| awk -F"-server" '{print $1}'|sort | uniq`
> Port=`ps -eaf | grep /usr/sbin/glusterfsd| grep -v grep | awk '{print
> $NF}'| awk -F"." '{print $NF}'`
>
> for Volumes in ${GlusterVol};
> do
> cd /var/lib/glusterd/vols/${Volumes}/bricks
> Bricks=`ls ${Host}*`
> for Brick in ${Bricks};
> do
> Onfile=`grep ^listen-port "${Brick}"`
> BrickDir=`echo "${Brick}"| awk -F":" '{print $2}'| cut -c2-`
> Daemon=`ps -eaf | grep "\${BrickDir}.pid" |grep -v grep | awk '{print
> $NF}' | awk -F"." '{print $2}'`
> #echo Onfile: ${Onfile}
> #echo Daemon: ${Daemon}
> if [ "${Onfile}" = "${Daemon}" ]; then
> echo "OK For ${Brick}"
> else
> echo "!!! NOT OK For ${Brick}"
> fi
> done
> done
>
>
> Met vriendelijke groet,
>
> Mike Hulsman
>
> Proxy Managed Services B.V. | www.proxy.nl | Enterprise IT-Infra, Open
> Source and Cloud Technology
> Delftweg 128 3043 NB Rotterdam The Netherlands
> <https://maps.google.com/?q=Delftweg+128+3043+NB+Rotterdam+The+Netherlands&entry=gmail&source=g>
> | +31 10 307 0965
>
> ------------------------------
>
> *From: *"Jo Goossens" <jo.goossens at hosted-power.com>
> *To: *"Atin Mukherjee" <amukherj at redhat.com>
> *Cc: *gluster-users at gluster.org
> *Sent: *Friday, October 27, 2017 11:06:35 PM
> *Subject: *Re: [Gluster-users] BUG: After stop and start wrong port is
> advertised
>
> RE: [Gluster-users] BUG: After stop and start wrong port is advertised
>
> Hello Atin,
>
>
>
>
>
> I just read it and very happy you found the issue. We really hope this
> will be fixed in the next 3.10.7 version!
>
>
>
>
>
> PS: Wow nice all that c code and those "goto out" statements (not always
> considered clean but the best way often I think). Can remember the days I
> wrote kernel drivers myself in c :)
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
>
>
>
>
>
> -----Original message-----
> *From:* Atin Mukherjee <amukherj at redhat.com>
> *Sent:* Fri 27-10-2017 21:01
> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
> advertised
> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
> *CC:* gluster-users at gluster.org;
> We (finally) figured out the root cause, Jo!
>
> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>
> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <jo.goossens at hosted-power.com
> > wrote:
>
> Hi,
>
>
>
>
>
> We use glusterfs 3.10.5 on Debian 9.
>
>
>
> When we stop or restart the service, e.g.: service glusterfs-server restart
>
>
>
> We see that the wrong port get's advertised afterwards. For example:
>
>
>
> Before restart:
>
>
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49152     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5932
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 13084
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 15499
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> After restart of the service on one of the nodes (192.168.140.43) the port
> seems to have changed (but it didn't):
>
> root at app3:/var/log/glusterfs#  gluster volume status
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49154     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 4628
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 3077
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 28777
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> However the active process is STILL the same pid AND still listening on
> the old port
>
> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
> LISTEN      5913/glusterfsd
>
>
> The other nodes logs fill up with errors because they can't reach the
> daemon anymore. They try to reach it on the "new" port instead of the old
> one:
>
> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
>
> So they now try 49154 instead of the old 49152
>
> Is this also by design? We had a lot of issues because of this recently.
> We don't understand why it starts advertising a completely wrong port after
> stop/start.
>
>
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171108/da8dfbd1/attachment.html>


More information about the Gluster-users mailing list