[Bugs] [Bug 1544637] New: 3.8 -> 3.10 rolling upgrade fails ( same for 3.12 or 3.13) on Ubuntu 14
bugzilla at redhat.com
bugzilla at redhat.com
Tue Feb 13 05:49:03 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1544637
Bug ID: 1544637
Summary: 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or
3.13) on Ubuntu 14
Product: GlusterFS
Version: 3.12
Component: glusterd
Assignee: bugs at gluster.org
Reporter: hgowtham at redhat.com
CC: alexandrumarcu at gmail.com, amukherj at redhat.com,
bugs at gluster.org, hgowtham at redhat.com
Depends On: 1544461, 1544600
+++ This bug was initially created as a clone of Bug #1544600 +++
+++ This bug was initially created as a clone of Bug #1544461 +++
Description of problem: Unable to upgrade Gluster cluster to 3.10.10 version
after 3.8.15 version ( same for 3.12 & 3.13 i think is related to
https://bugzilla.redhat.com/show_bug.cgi?id=1511903 )
Version-Release number of selected component (if applicable): old one 3.8.15 ,
new one 3.10.10
How reproducible: Always (also tried with 3.12 and 3.13)
Steps to Reproduce:
1. Install 3.10.10 on Ubuntu 14 from PPA.
2. Upgrade one of those nodes latest 3.10 ( now 3.10.10)
3. Newly upgraded node will be rejected from a gluster cluster.
Actual results: Node is rejected from cluster
Expected results: Node must be accepted
Additional info:
I have a 5 x replicated on Ubuntu 14.
I am trying to update GlusterFS. First i was at 3.7 version from which i tried
multiple scenarios and all failed while directly trying with the newer
GlusterFS versions (3.10 3.12 3.13). I then noticed that 3.8 is working fine so
i updated from 3.7.20 to 3.8.15 as an intermediary version. While trying to
update ( i only updated 1/5 servers to 3.10.10 while the rest are at 3.8.15) to
the next 3.10 LTM the node which was updated is throwing following error:
"Version of Cksums gluster_volume differ. local cksum = 3272345312, remote
cksum = 469010668 on peer 1-gls-dus21-ci-efood-real-de.openstacklocal"
Also all peers are now in "Peer Rejected (Connected)" state after update.
Volume Name: gluster_volume
Type: Replicate
Volume ID: 2e6bd6ba-37c8-4808-9156-08545cea3e3e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 5 = 5
Transport-type: tcp
Bricks:
Brick1: 2-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick2: 1-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick3: 1-gls-dus21-ci-efood-real-de:/export_vdb
Brick4: 3-gls-dus10-ci-efood-real-de.openstack.local:/export_vdb
Brick5: 2-gls-dus21-ci-efood-real-de.openstacklocal:/export_vdb
Options Reconfigured:
features.barrier: off
performance.readdir-ahead: on
auth.allow:
10.96.213.245,10.96.214.101,10.97.177.132,10.97.177.127,10.96.214.93,10.97.177.139,10.96.214.119,10.97.177.106,10.96.210.69,10.96.214.94,10.97.177.118,10.97.177.128,10.96.214.98
nfs.disable: on
performance.cache-size: 2GB
performance.cache-max-file-size: 1MB
cluster.self-heal-window-size: 64
performance.io-thread-count: 32
root at 1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster peer status
Number of Peers: 4
Hostname: 3-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 3d141235-9b93-4798-8e03-82a758216b0b
State: Peer in Cluster (Connected)
Hostname: 1-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 00839049-2ade-48f8-b5f3-66db0e2b9377
State: Peer in Cluster (Connected)
Hostname: 2-gls-dus10-ci-efood-real-de.openstack.local
Uuid: 1617cd54-9b2a-439e-9aa6-30d4ecf303f8
State: Peer in Cluster (Connected)
Hostname: 2-gls-dus21-ci-efood-real-de.openstacklocal
Uuid: 0c698b11-9078-441a-9e7f-442befeef7a9
State: Peer Rejected (Connected)
Volume status from one of which was not updated:
root at 1-gls-dus21-ci-efood-real-de:/home/ubuntu# gluster volume status
Status of volume: gluster_volume
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 2-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb 49153 0 Y 30521
Brick 1-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb 49152 0 Y 23166
Brick 1-gls-dus21-ci-efood-real-de:/export_
vdb 49153 0 Y 2322
Brick 3-gls-dus10-ci-efood-real-de.openstac
k.local:/export_vdb 49153 0 Y 10854
Self-heal Daemon on localhost N/A N/A Y 4931
Self-heal Daemon on 3-gls-dus10-ci-efood-re
al-de.openstack.local N/A N/A Y 16591
Self-heal Daemon on 2-gls-dus10-ci-efood-re
al-de.openstack.local N/A N/A Y 4621
Self-heal Daemon on 1-gls-dus10-ci-efood-re
al-de.openstack.local N/A N/A Y 3487
Task Status of Volume gluster_volume
------------------------------------------------------------------------------
There are no active volume tasks
And from the updated one:
root at 2-gls-dus21-ci-efood-real-de:/var/log/glusterfs# gluster volume status
Status of volume: gluster_volume
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 2-gls-dus21-ci-efood-real-de.openstac
klocal:/export_vdb N/A N/A N N/A
NFS Server on localhost N/A N/A N N/A
Task Status of Volume gluster_volume
------------------------------------------------------------------------------
There are no active volume tasks
[2018-02-12 13:35:53.400122] E [MSGID: 106010]
[glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of
Cksums gluster_volume differ. local cksum = 3272345312, remote cksum =
469010668 on peer 1-gls-dus10-ci-efood-real-de.openstack.local
[2018-02-12 13:35:53.400211] I [MSGID: 106493]
[glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to 1-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1
[2018-02-12 13:35:53.417588] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2018-02-12 13:35:53.430748] I [MSGID: 106490]
[glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3d141235-9b93-4798-8e03-82a758216b0b
[2018-02-12 13:35:53.431024] E [MSGID: 106010]
[glusterd-utils.c:3043:glusterd_compare_friend_volume] 0-management: Version of
Cksums gluster_volume differ. local cksum = 3272345312, remote cksum =
469010668 on peer 3-gls-dus10-ci-efood-real-de.openstack.local
[2018-02-12 13:35:53.431121] I [MSGID: 106493]
[glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded
to 3-gls-dus10-ci-efood-real-de.openstack.local (0), ret: 0, op_ret: -1
[2018-02-12 13:35:53.473344] I [MSGID: 106493]
[glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 7488286f-6bfa-46f8-bc50-9ee815e96c66, host:
1-gls-dus21-ci-efood-real-de.openstacklocal, port: 0
I do no have this file on any of the servers:
`/var/lib/glusterd/vols/remote/info` but i attached the
`/var/lib/glusterd/vols/gluster_volume/info` from the upgraded one and from a
server which was not upgraded.
The 3.7 version was running fine for quite some time so we can exclude network
issue, selinux etc..
--- Additional comment from Marc on 2018-02-12 09:51:24 EST ---
I see that on the new node i have the new "tier-enabled=0", could it be also
related to this: https://www.spinics.net/lists/gluster-users/msg33329.html.
--- Additional comment from Atin Mukherjee on 2018-02-12 10:07:17 EST ---
This is indeed a bug and we have managed to root cause it couple of days back.
I am assigning it to one of my colleague Hari who is aware of this issue and
the fix required. For the time being, please remove tier-enabled=0 in all the
info files from the node which has been upgraded and then once all nodes are
upgraded bump up the cluster.op-version.
@Hari - we need to send this fix to 3.10, 3.12 and 4.0 branch by changing the
op-version check to 3.11 instead of 3.7.6.
--- Additional comment from Worker Ant on 2018-02-12 21:39:11 EST ---
REVIEW: https://review.gluster.org/19552 (glusterd: fix tier-enabled flag
op-version check) posted (#1) for review on master by Atin Mukherjee
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1544461
[Bug 1544461] 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on
Ubuntu 14
https://bugzilla.redhat.com/show_bug.cgi?id=1544600
[Bug 1544600] 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on
Ubuntu 14
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list