[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

Tue Mar 5 07:23:59 UTC 2019

Interestingly: gluster volume status misses gluster1, while heal
statistics show gluster1:

gluster volume status workdata
Status of volume: workdata
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster2:/gluster/md4/workdata        49153     0          Y       1723
Brick gluster3:/gluster/md4/workdata        49153     0          Y       2068
Self-heal Daemon on localhost               N/A       N/A        Y       1732
Self-heal Daemon on gluster3                N/A       N/A        Y       2077

vs.

gluster volume heal workdata statistics heal-count
Gathering count of entries to be healed on volume workdata has been successful

Brick gluster1:/gluster/md4/workdata
Number of entries: 0

Brick gluster2:/gluster/md4/workdata
Number of entries: 10745

Brick gluster3:/gluster/md4/workdata
Number of entries: 10744

Am Di., 5. März 2019 um 08:18 Uhr schrieb Hu Bert <revirii at googlemail.com>:
>
> Hi Miling,
>
> well, there are such entries, but those haven't been a problem during
> install and the last kernel update+reboot. The entries look like:
>
> PUBLIC_IP  gluster2.alpserver.de gluster2
>
> 192.168.0.50 gluster1
> 192.168.0.51 gluster2
> 192.168.0.52 gluster3
>
> 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> 1st line, did a reboot ... no, didn't help. From
> /var/log/glusterfs/glusterd.log
>  on gluster 2:
>
> [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> Version of Cksums persistent differ. local cksum = 3950307018, remote
> cksum = 455409345 on peer gluster1
> [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster1 (0), ret: 0, op_ret: -1
>
> Interestingly there are no entries in the brick logs of the rejected
> server. Well, not surprising as no brick process is running. The
> server gluster1 is still in rejected state.
>
> 'gluster volume start workdata force' starts the brick process on
> gluster1, and some heals are happening on gluster2+3, but via 'gluster
> volume status workdata' the volumes still aren't complete.
>
> gluster1:
> ------------------------------------------------------------------------------
> Brick gluster1:/gluster/md4/workdata        49152     0          Y       2523
> Self-heal Daemon on localhost               N/A       N/A        Y       2549
>
> gluster2:
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick gluster2:/gluster/md4/workdata        49153     0          Y       1723
> Brick gluster3:/gluster/md4/workdata        49153     0          Y       2068
> Self-heal Daemon on localhost               N/A       N/A        Y       1732
> Self-heal Daemon on gluster3                N/A       N/A        Y       2077
>
>
> Hubert
>
> Am Di., 5. März 2019 um 07:58 Uhr schrieb Milind Changire <mchangir at redhat.com>:
> >
> > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to.
> > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries.
> >
> >
> > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at googlemail.com> wrote:
> >>
> >> Good morning,
> >>
> >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on
> >> debian stretch. This morning i upgraded one server to version 5.4 and
> >> rebooted the machine; after the restart i noticed that:
> >>
> >> - no brick process is running
> >> - gluster volume status only shows the server itself:
> >> gluster volume status workdata
> >> Status of volume: workdata
> >> Gluster process                             TCP Port  RDMA Port  Online  Pid
> >> ------------------------------------------------------------------------------
> >> Brick gluster1:/gluster/md4/workdata        N/A       N/A        N       N/A
> >> NFS Server on localhost                     N/A       N/A        N       N/A
> >>
> >> - gluster peer status on the server
> >> gluster peer status
> >> Number of Peers: 2
> >>
> >> Hostname: gluster3
> >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> >> State: Peer Rejected (Connected)
> >>
> >> Hostname: gluster2
> >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> >> State: Peer Rejected (Connected)
> >>
> >> - gluster peer status on the other 2 servers:
> >> gluster peer status
> >> Number of Peers: 2
> >>
> >> Hostname: gluster1
> >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> >> State: Peer Rejected (Connected)
> >>
> >> Hostname: gluster3
> >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> >> State: Peer in Cluster (Connected)
> >>
> >> I noticed that, in the brick logs, i see that the public IP is used
> >> instead of the LAN IP. brick logs from one of the volumes:
> >>
> >> rejected node: https://pastebin.com/qkpj10Sd
> >> connected nodes: https://pastebin.com/8SxVVYFV
> >>
> >> Why is the public IP suddenly used instead of the LAN IP? Killing all
> >> gluster processes and rebooting (again) didn't help.
> >>
> >>
> >> Thx,
> >> Hubert
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Milind
> >