[Bugs] [Bug 1544600] New: 3.8 -> 3.10 rolling upgrade fails ( same for 3.12 or 3.13) on Ubuntu 14

bugzilla at redhat.com bugzilla at redhat.com
Tue Feb 13 02:38:06 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1544600

            Bug ID: 1544600
           Summary: 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or
                    3.13) on Ubuntu 14
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: alexandrumarcu at gmail.com, amukherj at redhat.com,
                    bugs at gluster.org, hgowtham at redhat.com
        Depends On: 1544461



+++ This bug was initially created as a clone of Bug #1544461 +++

Description of problem: Unable to upgrade Gluster cluster to 3.10.10 version
after 3.8.15 version ( same for 3.12 & 3.13 i think is related to
https://bugzilla.redhat.com/show_bug.cgi?id=1511903 )


Version-Release number of selected component (if applicable): old one 3.8.15 ,
new one 3.10.10


How reproducible: Always (also tried with 3.12 and 3.13)


Steps to Reproduce:
1. Install 3.10.10 on Ubuntu 14 from PPA.
2. Upgrade one of those nodes latest 3.10 ( now 3.10.10)
3. Newly upgraded node will be rejected from a gluster cluster.

Actual results: Node is rejected from cluster


Expected results:  Node must be accepted


Additional info:
I have a 5 x replicated on Ubuntu 14. 
I am trying to update GlusterFS. First i was at 3.7 version from which i tried
multiple scenarios and all failed while directly trying with the newer
GlusterFS versions (3.10 3.12 3.13). I then noticed that 3.8 is working fine so
i updated from 3.7.20 to 3.8.15 as an intermediary version. While trying to
update ( i only updated 1/5 servers to 3.10.10 while the rest are at 3.8.15) to
the next 3.10 LTM the node which was updated is throwing following error:

"Version of Cksums gluster_volume differ. local cksum = 3272345312, remote
cksum = 469010668 on peer 1-gls-dus21-ci-efood-real-de.openstacklocal" 


Also all peers are now in "Peer Rejected (Connected)" state after update.

Volume Name: gluster_volume
Type: Replicate
Volume ID: 2e6bd6ba-37c8-4808-9156-08545cea3e3e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 5 = 5
Transport-type: tcp
Bricks:
Brick1: 2-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick2: 1-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick3: 1-gls-dus21-ci-efood-real-de:/export_vdb
Brick4: 3-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick5: 2-gls-dus21-ci-efood-real-de.openstacklocal:/export_vdb
Options Reconfigured:
features.barrier: off
performance.readdir-ahead: on
auth.allow:
10.96.213.245,10.96.214.101,10.97.177.132,10.97.177.127,10.96.214.93,10.97.177.139,10.96.214.119,10.97.177.106,10.96.210.69,10.96.214.94,10.97.177.118,10.97.177.128,10.96.214.98
nfs.disable: on
performance.cache-size: 2GB
performance.cache-max-file-size: 1MB
cluster.self-heal-window-size: 64
performance.io-thread-count: 32


root at 1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster peer status
Number of Peers: 4

Hostname: 3-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 3d141235-9b93-4798-8e03-82a758216b0b
State: Peer in Cluster (Connected)

Hostname: 1-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 00839049-2ade-48f8-b5f3-66db0e2b9377
State: Peer in Cluster (Connected)

Hostname: 2-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 1617cd54-9b2a-439e-9aa6-30d4ecf303f8
State: Peer in Cluster (Connected)

Hostname: 2-gls-dus21-ci-efood-real-de.openstacklocal
Uuid: 0c698b11-9078-441a-9e7f-442befeef7a9
State: Peer Rejected (Connected)



Volume status from one of which was not updated:

root at 1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster volume status
Status of volume: gluster_volume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 2-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb                         49153     0          Y       30521
Brick 1-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb                         49152     0          Y       23166
Brick 1-gls-dus21-ci-efood-real-de:/export_
vdb                                         49153     0          Y       2322
Brick 3-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb                         49153     0          Y       10854
Self-heal Daemon on localhost               N/A       N/A        Y       4931
Self-heal Daemon on 3-gls-dus10-ci-efood-re
al-de.openstack.local                       N/A       N/A        Y       16591
Self-heal Daemon on 2-gls-dus10-ci-efood-re
al-de.openstack.local                       N/A       N/A        Y       4621
Self-heal Daemon on 1-gls-dus10-ci-efood-re
al-de.openstack.local                       N/A       N/A        Y       3487

Task Status of Volume gluster_volume
------------------------------------------------------------------------------
There are no active volume tasks

And from the updated one:

root at 2-gls-dus21-ci-efood-real-de:/var/log/glusterfs# gluster volume status
Status of volume: gluster_volume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 2-gls-dus21-ci-efood-real-de.openstac
klocal:/export_vdb                          N/A       N/A        N       N/A
NFS Server on localhost                     N/A       N/A        N       N/A

Task Status of Volume gluster_volume
------------------------------------------------------------------------------
There are no active volume tasks




[2018-02-12 13:35:53.400122] E [MSGID: 106010]
[glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of
Cksums gluster_volume differ. local cksum = 3272345312, remote cksum =
469010668 on peer 1-gls-dus10-ci-efood-real-de.openstack.local
[2018-02-12 13:35:53.400211] I [MSGID: 106493]
[glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to 1-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1
[2018-02-12 13:35:53.417588] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2018-02-12 13:35:53.430748] I [MSGID: 106490]
[glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3d141235-9b93-4798-8e03-82a758216b0b
[2018-02-12 13:35:53.431024] E [MSGID: 106010]
[glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of
Cksums gluster_volume differ. local cksum = 3272345312, remote cksum =
469010668 on peer 3-gls-dus10-ci-efood-real-de.openstack.local
[2018-02-12 13:35:53.431121] I [MSGID: 106493]
[glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to 3-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1
[2018-02-12 13:35:53.473344] I [MSGID: 106493]
[glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 7488286f-6bfa-46f8-bc50-9ee815e96c66, host:
1-gls-dus21-ci-efood-real-de.openstacklocal, port: 0


I do no have this file on any of the servers:
`/var/lib/glusterd/vols/remote/info` but i attached the
`/var/lib/glusterd/vols/gluster_volume/info` from the upgraded one and from a
server which was not upgraded.


The 3.7 version was running fine for quite some time so we can exclude network
issue, selinux etc..

--- Additional comment from Marc on 2018-02-12 09:51:24 EST ---

I see that on the new node i have the new "tier-enabled=0", could it be also
related to this: https://www.spinics.net/lists/gluster-users/msg33329.html.

--- Additional comment from Atin Mukherjee on 2018-02-12 10:07:17 EST ---

This is indeed a bug and we have managed to root cause it couple of days back.
I am assigning it to one of my colleague Hari who is aware of this issue and
the fix required. For the time being, please remove tier-enabled=0 in all the
info files from the node which has been upgraded and then once all nodes are
upgraded bump up the cluster.op-version.

@Hari - we need to send this fix to 3.10, 3.12 and 4.0 branch by changing the
op-version check to 3.11 instead of 3.7.6.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1544461
[Bug 1544461] 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on
Ubuntu 14
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list