[Bugs] [Bug 1685120] New: upgrade from 3.12, 4.1 and 5 to 6 broken

bugzilla at redhat.com bugzilla at redhat.com
Mon Mar 4 11:42:45 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1685120

            Bug ID: 1685120
           Summary: upgrade from 3.12, 4.1 and 5 to 6 broken
           Product: GlusterFS
           Version: mainline
            Status: NEW
        Whiteboard: gluster-test-day
         Component: core
          Severity: urgent
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: srakonde at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    hgowtham at redhat.com, pasik at iki.fi, srakonde at redhat.com
        Depends On: 1684029
            Blocks: 1672818 (glusterfs-6.0)
  Target Milestone: ---
    Classification: Community



+++ This bug was initially created as a clone of Bug #1684029 +++

Description of problem:
While trying to upgrade from older versions like 3.12, 4.1 and 5 to gluster 6
RC, the upgrade ends in peer rejected on one node after other.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. create a replica 3 on older versions (3, 4, or 5)
2. kill the gluster process on one node and install gluster 6
3. start glusterd

Actual results:
the new version gets peer rejected. and the brick processes or not started by
glusterd.

Expected results:
peer reject should not happen. Cluster should be healthy.

Additional info:
Status shows the bricks on that particular node alone with N/A as status. Other
nodes aren't visible.
Looks like a volfile mismatch. 
The new volfile has "option transport.socket.ssl-enabled off" added while the
old volfile misses it.
The order of quick-read and open-behind are changed in the old and new
versions.

These changes cause the volfile mismatch and mess the cluster.

--- Additional comment from Sanju on 2019-02-28 17:25:57 IST ---

The peers are running inro rejected state because there is a mismatch in the
volfiles. Differences are:
1. Newer volfiles are having "option transport.socket.ssl-enabled off" where
older volfiles are not having this option.
2. order of quick-read and open-behind are changed

commit 4e0fab4 introduced this issue. previously we didn't had any default
value for the option transport.socket.ssl-enabled. So this option was not
captured in the volfile. with the above commit, we are adding a default value.
So this is getting captured in volfile.

commit 4e0fab4 has a fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1651059. I feel this commit has
less significance, we can revert this change. If we do so, we are out of 1st
problem.

not sure, why the order of quick-read and open-behind are changed.

Atin, do let me know your thoughts on proposal of reverting the commit 4e0fab4.

Thanks,
Sanju

--- Additional comment from Sanju on 2019-03-04 14:58:55 IST ---

Root cause:
Commit 5a152a changed the mechanism of computing the checksum. Because of this
change, in heterogeneous cluster, glusterd in upgraded node follows new
mechanism for computing the cksum and non-upgraded nodes follow old mechanism
for computing the cksum. So the cksum in upgraded node doesn't match with
non-upgraded nodes which results in peer rejection issue.

Thanks,
Sanju


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1672818
[Bug 1672818] GlusterFS 6.0 tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1684029
[Bug 1684029] upgrade from 3.12, 4.1 and 5 to 6 broken
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list