[Gluster-users] BUG: After stop and start wrong port is advertised

Mon Jan 22 04:34:23 UTC 2018

The patch was definitely there in 3.12.3. Do you have the glusterd and
brick logs handy with you when this happened?

On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <alan.orth at gmail.com> wrote:

> For what it's worth, I just updated some CentOS 7 servers from GlusterFS
> 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had
> to use Mike Hulsman's script to check the daemon port against the port in
> the volume's brick info, update the port, and restart glusterd on each
> node. Luckily I only have four servers! Hoping I don't have to do this
> every time I reboot!
>
> Regards,
>
> On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <amukherj at redhat.com> wrote:
>
>> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <jo.goossens at hosted-power.com>
>> wrote:
>>
>>> Hello Atin,
>>>
>>>
>>>
>>>
>>>
>>> Could you confirm this should have been fixed in 3.10.8? If so we'll
>>> test it for sure!
>>>
>>
>> Fix should be part of 3.10.8 which is awaiting release announcement.
>>
>>
>>>
>>> Regards
>>>
>>> Jo
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original message-----
>>> *From:* Atin Mukherjee <amukherj at redhat.com>
>>>
>>> *Sent:* Mon 30-10-2017 17:40
>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>>> advertised
>>> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
>>> *CC:* gluster-users at gluster.org;
>>>
>>> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens at hosted-power.com>
>>> wrote:
>>>
>>> Hello Atin,
>>>
>>>
>>>
>>>
>>>
>>> I just read it and very happy you found the issue. We really hope this
>>> will be fixed in the next 3.10.7 version!
>>>
>>>
>>> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is
>>> getting tagged today. You’ll get this fix in 3.10.8.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> PS: Wow nice all that c code and those "goto out" statements (not always
>>> considered clean but the best way often I think). Can remember the days I
>>> wrote kernel drivers myself in c :)
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> Jo Goossens
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original message-----
>>> *From:* Atin Mukherjee <amukherj at redhat.com>
>>> *Sent:* Fri 27-10-2017 21:01
>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>>> advertised
>>> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
>>> *CC:* gluster-users at gluster.org;
>>>
>>> We (finally) figured out the root cause, Jo!
>>>
>>> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>>>
>>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <
>>> jo.goossens at hosted-power.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>> We use glusterfs 3.10.5 on Debian 9.
>>>
>>>
>>>
>>> When we stop or restart the service, e.g.: service glusterfs-server
>>> restart
>>>
>>>
>>>
>>> We see that the wrong port get's advertised afterwards. For example:
>>>
>>>
>>>
>>> Before restart:
>>>
>>>
>>> Status of volume: public
>>> Gluster process                             TCP Port  RDMA Port  Online
>>>  Pid
>>> ------------------------------------------------------------
>>> ------------------
>>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>>>   6364
>>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>>>   1483
>>> Brick 192.168.140.43:/gluster/public        49152     0          Y
>>>   5913
>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>> 5932
>>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>>> 13084
>>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>>> 15499
>>>
>>> Task Status of Volume public
>>> ------------------------------------------------------------
>>> ------------------
>>> There are no active volume tasks
>>>
>>>
>>> After restart of the service on one of the nodes (192.168.140.43) the
>>> port seems to have changed (but it didn't):
>>>
>>> root at app3:/var/log/glusterfs#  gluster volume status
>>> Status of volume: public
>>> Gluster process                             TCP Port  RDMA Port  Online
>>>  Pid
>>> ------------------------------------------------------------
>>> ------------------
>>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>>>   6364
>>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>>>   1483
>>> Brick 192.168.140.43:/gluster/public        49154     0          Y
>>>   5913
>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>> 4628
>>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>>> 3077
>>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>>> 28777
>>>
>>> Task Status of Volume public
>>> ------------------------------------------------------------
>>> ------------------
>>> There are no active volume tasks
>>>
>>>
>>> However the active process is STILL the same pid AND still listening on
>>> the old port
>>>
>>> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
>>> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
>>> LISTEN      5913/glusterfsd
>>>
>>>
>>> The other nodes logs fill up with errors because they can't reach the
>>> daemon anymore. They try to reach it on the "new" port instead of the old
>>> one:
>>>
>>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>> (Connection refused); disconnecting socket
>>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 0-public-client-2: changing port to 49154 (from 0)
>>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>> (Connection refused); disconnecting socket
>>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 0-public-client-2: changing port to 49154 (from 0)
>>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>> (Connection refused); disconnecting socket
>>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 0-public-client-2: changing port to 49154 (from 0)
>>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>> (Connection refused); disconnecting socket
>>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 0-public-client-2: changing port to 49154 (from 0)
>>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>> (Connection refused); disconnecting socket
>>>
>>> So they now try 49154 instead of the old 49152
>>>
>>> Is this also by design? We had a lot of issues because of this recently.
>>> We don't understand why it starts advertising a completely wrong port after
>>> stop/start.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> Jo Goossens
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> --
>>> - Atin (atinm)
>>>
>>> --
>> - Atin (atinm)
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
>
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180122/4876da78/attachment.html>