[Gluster-users] BUG: After stop and start wrong port is advertised

Alan Orth alan.orth at gmail.com
Mon Jan 22 09:45:48 UTC 2018


Ouch! Yes, I see two port-related fixes in the GlusterFS 3.12.3 release
notes[0][1][2]. I've attached a tarball of all yesterday's logs from
/var/log/glusterd on one the affected nodes (called "wingu3"). I hope
that's what you need.

[0]
https://github.com/gluster/glusterfs/blob/release-3.12/doc/release-notes/3.12.3.md
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507747
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1507748

Thanks,

On Mon, Jan 22, 2018 at 6:34 AM Atin Mukherjee <amukherj at redhat.com> wrote:

> The patch was definitely there in 3.12.3. Do you have the glusterd and
> brick logs handy with you when this happened?
>
> On Sun, Jan 21, 2018 at 10:21 PM, Alan Orth <alan.orth at gmail.com> wrote:
>
>> For what it's worth, I just updated some CentOS 7 servers from GlusterFS
>> 3.12.1 to 3.12.4 and hit this bug. Did the patch make it into 3.12.4? I had
>> to use Mike Hulsman's script to check the daemon port against the port in
>> the volume's brick info, update the port, and restart glusterd on each
>> node. Luckily I only have four servers! Hoping I don't have to do this
>> every time I reboot!
>>
>> Regards,
>>
>> On Sat, Dec 2, 2017 at 5:23 PM Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>>> On Sat, 2 Dec 2017 at 19:29, Jo Goossens <jo.goossens at hosted-power.com>
>>> wrote:
>>>
>>>> Hello Atin,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Could you confirm this should have been fixed in 3.10.8? If so we'll
>>>> test it for sure!
>>>>
>>>
>>> Fix should be part of 3.10.8 which is awaiting release announcement.
>>>
>>>
>>>>
>>>> Regards
>>>>
>>>> Jo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original message-----
>>>> *From:* Atin Mukherjee <amukherj at redhat.com>
>>>>
>>>> *Sent:* Mon 30-10-2017 17:40
>>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>>>> advertised
>>>> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
>>>> *CC:* gluster-users at gluster.org;
>>>>
>>>> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens at hosted-power.com>
>>>> wrote:
>>>>
>>>> Hello Atin,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I just read it and very happy you found the issue. We really hope this
>>>> will be fixed in the next 3.10.7 version!
>>>>
>>>>
>>>> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is
>>>> getting tagged today. You’ll get this fix in 3.10.8.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> PS: Wow nice all that c code and those "goto out" statements (not
>>>> always considered clean but the best way often I think). Can remember the
>>>> days I wrote kernel drivers myself in c :)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Jo Goossens
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original message-----
>>>> *From:* Atin Mukherjee <amukherj at redhat.com>
>>>> *Sent:* Fri 27-10-2017 21:01
>>>> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
>>>> advertised
>>>> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
>>>> *CC:* gluster-users at gluster.org;
>>>>
>>>> We (finally) figured out the root cause, Jo!
>>>>
>>>> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>>>>
>>>> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <
>>>> jo.goossens at hosted-power.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> We use glusterfs 3.10.5 on Debian 9.
>>>>
>>>>
>>>>
>>>> When we stop or restart the service, e.g.: service glusterfs-server
>>>> restart
>>>>
>>>>
>>>>
>>>> We see that the wrong port get's advertised afterwards. For example:
>>>>
>>>>
>>>>
>>>> Before restart:
>>>>
>>>>
>>>> Status of volume: public
>>>> Gluster process                             TCP Port  RDMA Port  Online
>>>>  Pid
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>>>>   6364
>>>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>>>>   1483
>>>> Brick 192.168.140.43:/gluster/public        49152     0          Y
>>>>   5913
>>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>>>   5932
>>>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>>>>   13084
>>>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>>>>   15499
>>>>
>>>> Task Status of Volume public
>>>>
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>>
>>>> After restart of the service on one of the nodes (192.168.140.43) the
>>>> port seems to have changed (but it didn't):
>>>>
>>>> root at app3:/var/log/glusterfs#  gluster volume status
>>>> Status of volume: public
>>>> Gluster process                             TCP Port  RDMA Port  Online
>>>>  Pid
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Brick 192.168.140.41:/gluster/public        49153     0          Y
>>>>   6364
>>>> Brick 192.168.140.42:/gluster/public        49152     0          Y
>>>>   1483
>>>> Brick 192.168.140.43:/gluster/public        49154     0          Y
>>>>   5913
>>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>>>   4628
>>>> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
>>>>   3077
>>>> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
>>>>   28777
>>>>
>>>> Task Status of Volume public
>>>>
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>>
>>>> However the active process is STILL the same pid AND still listening on
>>>> the old port
>>>>
>>>> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
>>>> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
>>>> LISTEN      5913/glusterfsd
>>>>
>>>>
>>>> The other nodes logs fill up with errors because they can't reach the
>>>> daemon anymore. They try to reach it on the "new" port instead of the old
>>>> one:
>>>>
>>>> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>>> (Connection refused); disconnecting socket
>>>> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>>> (Connection refused); disconnecting socket
>>>> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>>> (Connection refused); disconnecting socket
>>>> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>>> (Connection refused); disconnecting socket
>>>> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>>> 0-public-client-2: changing port to 49154 (from 0)
>>>> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
>>>> 0-public-client-2: connection to 192.168.140.43:49154 failed
>>>> (Connection refused); disconnecting socket
>>>>
>>>> So they now try 49154 instead of the old 49152
>>>>
>>>> Is this also by design? We had a lot of issues because of this
>>>> recently. We don't understand why it starts advertising a completely wrong
>>>> port after stop/start.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> Jo Goossens
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> --
>>>> - Atin (atinm)
>>>>
>>>> --
>>> - Atin (atinm)
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>>
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>>
>
>

-- 

Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180122/ce5b317b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster-logs-node-wingu3-2017-01-22.tar.gz
Type: application/x-gzip
Size: 250831 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180122/ce5b317b/attachment.gz>


More information about the Gluster-users mailing list