[Gluster-Maintainers] Release 4.0: Unable to complete rolling upgrade tests

Fri Mar 2 07:41:47 UTC 2018

On 03/02/2018 11:04 AM, Anoop C S wrote:
> On Fri, 2018-03-02 at 10:11 +0530, Ravishankar N wrote:
>> + Anoop.
>>
>> It looks like clients on the old (3.12) nodes are not able to talk to
>> the upgraded (4.0) node. I see messages like these on the old clients:
>>
>>    [2018-03-02 03:49:13.483458] W [MSGID: 114007]
>> [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-2:
>> failed to find key 'clnt-lk-version' in the options
> Seems like we need to set clnt-lk-version from server side too similar to what we did for client via
> https://review.gluster.org/#/c/19560/. Can you try with the attached patch?
Thanks, self-heal works with this. You might want to get it merged in 
4.0 ASAP.

I still got the mkdir error on a plain distribute volume that I referred 
to in the other email in this thread. Anyone who is interested in trying 
it out, the steps are:
- Create a 2 node 2x1 plain distribute vol on 3.13 and fuse mount on node-1
- Upgrade 2nd node to 4.0 and once it is up and running,
- Perform mkdir from the mount on node1-->this returns EIO

Thanks
Ravi
PS: Feeling a bit under the weather, so I might not be online today again.

>
>> Is there something more to be done on BZ 1544366?
>>
>> -Ravi
>> On 03/02/2018 08:44 AM, Ravishankar N wrote:
>>> On 03/02/2018 07:26 AM, Shyam Ranganathan wrote:
>>>> Hi Pranith/Ravi,
>>>>
>>>> So, to keep a long story short, post upgrading 1 node in a 3 node 3.13
>>>> cluster, self-heal is not able to catch the heal backlog and this is a
>>>> very simple synthetic test anyway, but the end result is that upgrade
>>>> testing is failing.
>>> Let me try this now and get back. I had done some thing similar when
>>> testing the FIPS patch and the rolling upgrade had worked.
>>> Thanks,
>>> Ravi
>>>> Here are the details,
>>>>
>>>> - Using
>>>> https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA#
>>>> I setup 3 server containers to install 3.13 first as follows (within the
>>>> containers)
>>>>
>>>> (inside the 3 server containers)
>>>> yum -y update; yum -y install centos-release-gluster313; yum install
>>>> glusterfs-server; glusterd
>>>>
>>>> (inside centos-glfs-server1)
>>>> gluster peer probe centos-glfs-server2
>>>> gluster peer probe centos-glfs-server3
>>>> gluster peer status
>>>> gluster v create patchy replica 3 centos-glfs-server1:/d/brick1
>>>> centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3
>>>> centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5
>>>> centos-glfs-server3:/d/brick6 force
>>>> gluster v start patchy
>>>> gluster v status
>>>>
>>>> Create a client container as per the document above, and mount the above
>>>> volume and create 1 file, 1 directory and a file within that directory.
>>>>
>>>> Now we start the upgrade process (as laid out for 3.13 here
>>>> http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ):
>>>> - killall glusterfs glusterfsd glusterd
>>>> - yum install
>>>> http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.cent
>>>> os.x86_64.rpm
>>>>
>>>> - yum upgrade --enablerepo=centos-gluster40-test glusterfs-server
>>>>
>>>> < Go back to the client and edit the contents of one of the files and
>>>> change the permissions of a directory, so that there are things to heal
>>>> when we bring up the newly upgraded server>
>>>>
>>>> - gluster --version
>>>> - glusterd
>>>> - gluster v status
>>>> - gluster v heal patchy
>>>>
>>>> The above starts failing as follows,
>>>> [root at centos-glfs-server1 /]# gluster v heal patchy
>>>> Launching heal operation to perform index self heal on volume patchy has
>>>> been unsuccessful:
>>>> Commit failed on centos-glfs-server2.glfstest20. Please check log file
>>>> for details.
>>>> Commit failed on centos-glfs-server3. Please check log file for details.
>>>>
>>>>   From here, if further files or directories are created from the client,
>>>> they just get added to the heal backlog, and heal does not catchup.
>>>>
>>>> As is obvious, I cannot proceed, as the upgrade procedure is broken. The
>>>> issue itself may not be selfheal deamon, but something around
>>>> connections, but as the process fails here, looking to you guys to
>>>> unblock this as soon as possible, as we are already running a day's slip
>>>> in the release.
>>>>
>>>> Thanks,
>>>> Shyam