[Bugs] [Bug 1685120] New: upgrade from 3.12, 4.1 and 5 to 6 broken
bugzilla at redhat.com
bugzilla at redhat.com
Mon Mar 4 11:42:45 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1685120
Bug ID: 1685120
Summary: upgrade from 3.12, 4.1 and 5 to 6 broken
Product: GlusterFS
Version: mainline
Status: NEW
Whiteboard: gluster-test-day
Component: core
Severity: urgent
Priority: high
Assignee: bugs at gluster.org
Reporter: srakonde at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
hgowtham at redhat.com, pasik at iki.fi, srakonde at redhat.com
Depends On: 1684029
Blocks: 1672818 (glusterfs-6.0)
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1684029 +++
Description of problem:
While trying to upgrade from older versions like 3.12, 4.1 and 5 to gluster 6
RC, the upgrade ends in peer rejected on one node after other.
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. create a replica 3 on older versions (3, 4, or 5)
2. kill the gluster process on one node and install gluster 6
3. start glusterd
Actual results:
the new version gets peer rejected. and the brick processes or not started by
glusterd.
Expected results:
peer reject should not happen. Cluster should be healthy.
Additional info:
Status shows the bricks on that particular node alone with N/A as status. Other
nodes aren't visible.
Looks like a volfile mismatch.
The new volfile has "option transport.socket.ssl-enabled off" added while the
old volfile misses it.
The order of quick-read and open-behind are changed in the old and new
versions.
These changes cause the volfile mismatch and mess the cluster.
--- Additional comment from Sanju on 2019-02-28 17:25:57 IST ---
The peers are running inro rejected state because there is a mismatch in the
volfiles. Differences are:
1. Newer volfiles are having "option transport.socket.ssl-enabled off" where
older volfiles are not having this option.
2. order of quick-read and open-behind are changed
commit 4e0fab4 introduced this issue. previously we didn't had any default
value for the option transport.socket.ssl-enabled. So this option was not
captured in the volfile. with the above commit, we are adding a default value.
So this is getting captured in volfile.
commit 4e0fab4 has a fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1651059. I feel this commit has
less significance, we can revert this change. If we do so, we are out of 1st
problem.
not sure, why the order of quick-read and open-behind are changed.
Atin, do let me know your thoughts on proposal of reverting the commit 4e0fab4.
Thanks,
Sanju
--- Additional comment from Sanju on 2019-03-04 14:58:55 IST ---
Root cause:
Commit 5a152a changed the mechanism of computing the checksum. Because of this
change, in heterogeneous cluster, glusterd in upgraded node follows new
mechanism for computing the cksum and non-upgraded nodes follow old mechanism
for computing the cksum. So the cksum in upgraded node doesn't match with
non-upgraded nodes which results in peer rejection issue.
Thanks,
Sanju
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1672818
[Bug 1672818] GlusterFS 6.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1684029
[Bug 1684029] upgrade from 3.12, 4.1 and 5 to 6 broken
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list