[Gluster-Maintainers] Release 4.0: Unable to complete rolling upgrade tests

Fri Mar 2 01:56:28 UTC 2018

Hi Pranith/Ravi,

So, to keep a long story short, post upgrading 1 node in a 3 node 3.13
cluster, self-heal is not able to catch the heal backlog and this is a
very simple synthetic test anyway, but the end result is that upgrade
testing is failing.

Here are the details,

- Using
https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA#
I setup 3 server containers to install 3.13 first as follows (within the
containers)

(inside the 3 server containers)
yum -y update; yum -y install centos-release-gluster313; yum install
glusterfs-server; glusterd

(inside centos-glfs-server1)
gluster peer probe centos-glfs-server2
gluster peer probe centos-glfs-server3
gluster peer status
gluster v create patchy replica 3 centos-glfs-server1:/d/brick1
centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3
centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5
centos-glfs-server3:/d/brick6 force
gluster v start patchy
gluster v status

Create a client container as per the document above, and mount the above
volume and create 1 file, 1 directory and a file within that directory.

Now we start the upgrade process (as laid out for 3.13 here
http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ):
- killall glusterfs glusterfsd glusterd
- yum install
http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.centos.x86_64.rpm
- yum upgrade --enablerepo=centos-gluster40-test glusterfs-server

< Go back to the client and edit the contents of one of the files and
change the permissions of a directory, so that there are things to heal
when we bring up the newly upgraded server>

- gluster --version
- glusterd
- gluster v status
- gluster v heal patchy

The above starts failing as follows,
[root at centos-glfs-server1 /]# gluster v heal patchy
Launching heal operation to perform index self heal on volume patchy has
been unsuccessful:
Commit failed on centos-glfs-server2.glfstest20. Please check log file
for details.
Commit failed on centos-glfs-server3. Please check log file for details.

>From here, if further files or directories are created from the client,
they just get added to the heal backlog, and heal does not catchup.

As is obvious, I cannot proceed, as the upgrade procedure is broken. The
issue itself may not be selfheal deamon, but something around
connections, but as the process fails here, looking to you guys to
unblock this as soon as possible, as we are already running a day's slip
in the release.

Thanks,
Shyam