[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

Tue Mar 5 08:26:09 UTC 2019

There are plans to revert the patch causing this error and rebuilt 5.4.
This should happen faster. the rebuilt 5.4 should be void of this upgrade issue.

In the meantime, you can use 5.3 for this cluster.
Downgrading to 5.3 will work if it was just one node that was upgrade to 5.4
and the other nodes are still in 5.3.

On Tue, Mar 5, 2019 at 1:07 PM Hu Bert <revirii at googlemail.com> wrote:
>
> Hi Hari,
>
> thx for the hint. Do you know when this will be fixed? Is a downgrade
> 5.4 -> 5.3 a possibility to fix this?
>
> Hubert
>
> Am Di., 5. März 2019 um 08:32 Uhr schrieb Hari Gowtham <hgowtham at redhat.com>:
> >
> > Hi,
> >
> > This is a known issue we are working on.
> > As the checksum differs between the updated and non updated node, the
> > peers are getting rejected.
> > The bricks aren't coming because of the same issue.
> >
> > More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120
> >
> > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert <revirii at googlemail.com> wrote:
> > >
> > > Interestingly: gluster volume status misses gluster1, while heal
> > > statistics show gluster1:
> > >
> > > gluster volume status workdata
> > > Status of volume: workdata
> > > Gluster process                             TCP Port  RDMA Port  Online  Pid
> > > ------------------------------------------------------------------------------
> > > Brick gluster2:/gluster/md4/workdata        49153     0          Y       1723
> > > Brick gluster3:/gluster/md4/workdata        49153     0          Y       2068
> > > Self-heal Daemon on localhost               N/A       N/A        Y       1732
> > > Self-heal Daemon on gluster3                N/A       N/A        Y       2077
> > >
> > > vs.
> > >
> > > gluster volume heal workdata statistics heal-count
> > > Gathering count of entries to be healed on volume workdata has been successful
> > >
> > > Brick gluster1:/gluster/md4/workdata
> > > Number of entries: 0
> > >
> > > Brick gluster2:/gluster/md4/workdata
> > > Number of entries: 10745
> > >
> > > Brick gluster3:/gluster/md4/workdata
> > > Number of entries: 10744
> > >
> > > Am Di., 5. März 2019 um 08:18 Uhr schrieb Hu Bert <revirii at googlemail.com>:
> > > >
> > > > Hi Miling,
> > > >
> > > > well, there are such entries, but those haven't been a problem during
> > > > install and the last kernel update+reboot. The entries look like:
> > > >
> > > > PUBLIC_IP  gluster2.alpserver.de gluster2
> > > >
> > > > 192.168.0.50 gluster1
> > > > 192.168.0.51 gluster2
> > > > 192.168.0.52 gluster3
> > > >
> > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> > > > 1st line, did a reboot ... no, didn't help. From
> > > > /var/log/glusterfs/glusterd.log
> > > >  on gluster 2:
> > > >
> > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> > > > Version of Cksums persistent differ. local cksum = 3950307018, remote
> > > > cksum = 455409345 on peer gluster1
> > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> > > > Responded to gluster1 (0), ret: 0, op_ret: -1
> > > >
> > > > Interestingly there are no entries in the brick logs of the rejected
> > > > server. Well, not surprising as no brick process is running. The
> > > > server gluster1 is still in rejected state.
> > > >
> > > > 'gluster volume start workdata force' starts the brick process on
> > > > gluster1, and some heals are happening on gluster2+3, but via 'gluster
> > > > volume status workdata' the volumes still aren't complete.
> > > >
> > > > gluster1:
> > > > ------------------------------------------------------------------------------
> > > > Brick gluster1:/gluster/md4/workdata        49152     0          Y       2523
> > > > Self-heal Daemon on localhost               N/A       N/A        Y       2549
> > > >
> > > > gluster2:
> > > > Gluster process                             TCP Port  RDMA Port  Online  Pid
> > > > ------------------------------------------------------------------------------
> > > > Brick gluster2:/gluster/md4/workdata        49153     0          Y       1723
> > > > Brick gluster3:/gluster/md4/workdata        49153     0          Y       2068
> > > > Self-heal Daemon on localhost               N/A       N/A        Y       1732
> > > > Self-heal Daemon on gluster3                N/A       N/A        Y       2077
> > > >
> > > >
> > > > Hubert
> > > >
> > > > Am Di., 5. März 2019 um 07:58 Uhr schrieb Milind Changire <mchangir at redhat.com>:
> > > > >
> > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to.
> > > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries.
> > > > >
> > > > >
> > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at googlemail.com> wrote:
> > > > >>
> > > > >> Good morning,
> > > > >>
> > > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on
> > > > >> debian stretch. This morning i upgraded one server to version 5.4 and
> > > > >> rebooted the machine; after the restart i noticed that:
> > > > >>
> > > > >> - no brick process is running
> > > > >> - gluster volume status only shows the server itself:
> > > > >> gluster volume status workdata
> > > > >> Status of volume: workdata
> > > > >> Gluster process                             TCP Port  RDMA Port  Online  Pid
> > > > >> ------------------------------------------------------------------------------
> > > > >> Brick gluster1:/gluster/md4/workdata        N/A       N/A        N       N/A
> > > > >> NFS Server on localhost                     N/A       N/A        N       N/A
> > > > >>
> > > > >> - gluster peer status on the server
> > > > >> gluster peer status
> > > > >> Number of Peers: 2
> > > > >>
> > > > >> Hostname: gluster3
> > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> Hostname: gluster2
> > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> - gluster peer status on the other 2 servers:
> > > > >> gluster peer status
> > > > >> Number of Peers: 2
> > > > >>
> > > > >> Hostname: gluster1
> > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> Hostname: gluster3
> > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > > >> State: Peer in Cluster (Connected)
> > > > >>
> > > > >> I noticed that, in the brick logs, i see that the public IP is used
> > > > >> instead of the LAN IP. brick logs from one of the volumes:
> > > > >>
> > > > >> rejected node: https://pastebin.com/qkpj10Sd
> > > > >> connected nodes: https://pastebin.com/8SxVVYFV
> > > > >>
> > > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all
> > > > >> gluster processes and rebooting (again) didn't help.
> > > > >>
> > > > >>
> > > > >> Thx,
> > > > >> Hubert
> > > > >> _______________________________________________
> > > > >> Gluster-users mailing list
> > > > >> Gluster-users at gluster.org
> > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Milind
> > > > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Regards,
> > Hari Gowtham.

-- 
Regards,
Hari Gowtham.