[Bugs] [Bug 1687051] gluster volume heal failed when online upgrading from 3.12 to 5.x and when rolling back online upgrade from 4.1.4 to 3.12.15

bugzilla at redhat.com bugzilla at redhat.com
Thu Mar 21 15:20:15 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1687051



--- Comment #36 from Amgad <amgad.saleh at nokia.com> ---
Thanks Sanju and Shyam.

I went ahead and built the 5.5 RPMS and re-did the online upgrade/rollback
tests from 3.12.15 to 5.5, and back. I got the same issue with online rollback.
Here is the data (logs are attached as well):

Case 1) online upgrade from 3.12.15 to 5.5 - upgrades stared right after: Thu
Mar 21 14:01:06 UTC 2019
==========================================
A) I have same cluster of 3 replicas: gfs-1 (10.76.153.206), gfs-2
(10.76.153.213), and gfs-3new (10.76.153.207), running 3.12.15. 
When online upgraded gfs-1 from 3.12.15 to 5.5, all bricks were online and heal
succeeded. Continuing with gfs-2, then gfs-3new, online upgrade, heal succeeded
as well.

1) Here's the output after gfs-1 was online upgraded from 3.12.15 to 5.5:
Logs uploaded are: gfs-1_gfs1_upg_log.tgz, gfs-2_gfs1_upg_log.tgz, and
gfs-3new_gfs1_upg_log.tgz.

All volumes/bricks are online and heal succeeded.

[root at gfs-1 ansible2]# gluster volume status
Status of volume: glustervol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data1/1            49155     0          Y       19559
Brick 10.76.153.213:/mnt/data1/1            49152     0          Y       11171
Brick 10.76.153.207:/mnt/data1/1            49152     0          Y       25740
Self-heal Daemon on localhost               N/A       N/A        Y       19587
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11161
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       25730

Task Status of Volume glustervol1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data2/2            49156     0          Y       19568
Brick 10.76.153.213:/mnt/data2/2            49153     0          Y       11180
Brick 10.76.153.207:/mnt/data2/2            49153     0          Y       25749
Self-heal Daemon on localhost               N/A       N/A        Y       19587
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11161
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       25730

Task Status of Volume glustervol2
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data3/3            49157     0          Y       19578
Brick 10.76.153.213:/mnt/data3/3            49154     0          Y       11189
Brick 10.76.153.207:/mnt/data3/3            49154     0          Y       25758
Self-heal Daemon on localhost               N/A       N/A        Y       19587
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       25730
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11161

Task Status of Volume glustervol3
------------------------------------------------------------------------------
There are no active volume tasks

[root at gfs-1 ansible2]# for i in glustervol1 glustervol2 glustervol3; do gluster
volume heal $i; done
Launching heal operation to perform index self heal on volume glustervol1 has
been successful 
Use heal info commands to check status.
Launching heal operation to perform index self heal on volume glustervol2 has
been successful 
Use heal info commands to check status.
Launching heal operation to perform index self heal on volume glustervol3 has
been successful 
Use heal info commands to check status.

Case 2) online rollback from 5.5 to 3.12.15 - upgrades stared right after: Thu
Mar 21 14:20:01 UTC 2019
===========================================
A) Here're the outputs after gfs-1 was online rolled back from 5.5 to 3.12.15 -
rollback succeeded. All bricks were online, but "gluster volume heal" was
unsuccessful:
Logs uploaded are: gfs-1_gfs1_rollbk_log.tgz, gfs-2_gfs1_rollbk_log.tgz, and
gfs-3new_gfs1_rollbk_log.tgz


[root at gfs-1 glusterfs]# gluster volume status
Status of volume: glustervol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data1/1            49152     0          Y       21586
Brick 10.76.153.213:/mnt/data1/1            49155     0          Y       9772 
Brick 10.76.153.207:/mnt/data1/1            49155     0          Y       12139
Self-heal Daemon on localhost               N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       9799 
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data2/2            49153     0          Y       21595
Brick 10.76.153.213:/mnt/data2/2            49156     0          Y       9781 
Brick 10.76.153.207:/mnt/data2/2            49156     0          Y       12148
Self-heal Daemon on localhost               N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       9799 
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol2
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data3/3            49154     0          Y       21604
Brick 10.76.153.213:/mnt/data3/3            49157     0          Y       9790 
Brick 10.76.153.207:/mnt/data3/3            49157     0          Y       12157
Self-heal Daemon on localhost               N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       9799 
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol3
------------------------------------------------------------------------------
There are no active volume tasks

[root at gfs-1 glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
gluster volume heal $i; done
Launching heal operation to perform index self heal on volume glustervol1 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
Commit failed on 10.76.153.213. Please check log file for details.
Launching heal operation to perform index self heal on volume glustervol2 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
Commit failed on 10.76.153.213. Please check log file for details.
Launching heal operation to perform index self heal on volume glustervol3 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
Commit failed on 10.76.153.213. Please check log file for details.
[root at gfs-1 glusterfs]# 

B) Same "heal" failure after rolling back gfs-2 from 5.5 to 3.12.15
===================================================================

[root at gfs-2 glusterfs]#  gluster volume status
Status of volume: glustervol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data1/1            49152     0          Y       21586
Brick 10.76.153.213:/mnt/data1/1            49152     0          Y       11313
Brick 10.76.153.207:/mnt/data1/1            49155     0          Y       12139
Self-heal Daemon on localhost               N/A       N/A        Y       11303
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data2/2            49153     0          Y       21595
Brick 10.76.153.213:/mnt/data2/2            49153     0          Y       11322
Brick 10.76.153.207:/mnt/data2/2            49156     0          Y       12148
Self-heal Daemon on localhost               N/A       N/A        Y       11303
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol2
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data3/3            49154     0          Y       21604
Brick 10.76.153.213:/mnt/data3/3            49154     0          Y       11331
Brick 10.76.153.207:/mnt/data3/3            49157     0          Y       12157
Self-heal Daemon on localhost               N/A       N/A        Y       11303
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.207           N/A       N/A        Y       12166

Task Status of Volume glustervol3
------------------------------------------------------------------------------
There are no active volume tasks

[root at gfs-2 glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
gluster volume heal $i; done
Launching heal operation to perform index self heal on volume glustervol1 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
Launching heal operation to perform index self heal on volume glustervol2 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
Launching heal operation to perform index self heal on volume glustervol3 has
been unsuccessful:
Commit failed on 10.76.153.207. Please check log file for details.
[root at gfs-2 glusterfs]# 

C) After rolling back gfs-3new from 5.5 to 3.12.15 (all are on 3.12.15 now)
heal succeeded
Logs uploaded are: gfs-1_all_rollbk_log.tgz, gfs-2_all_rollbk_log.tgz, and
gfs-3new_all_rollbk_log.tgz

[root at gfs-3new glusterfs]# gluster volume status
Status of volume: glustervol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data1/1            49152     0          Y       21586
Brick 10.76.153.213:/mnt/data1/1            49152     0          Y       11313
Brick 10.76.153.207:/mnt/data1/1            49152     0          Y       13724
Self-heal Daemon on localhost               N/A       N/A        Y       13714
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11303

Task Status of Volume glustervol1
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data2/2            49153     0          Y       21595
Brick 10.76.153.213:/mnt/data2/2            49153     0          Y       11322
Brick 10.76.153.207:/mnt/data2/2            49153     0          Y       13733
Self-heal Daemon on localhost               N/A       N/A        Y       13714
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11303

Task Status of Volume glustervol2
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: glustervol3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.76.153.206:/mnt/data3/3            49154     0          Y       21604
Brick 10.76.153.213:/mnt/data3/3            49154     0          Y       11331
Brick 10.76.153.207:/mnt/data3/3            49154     0          Y       13742
Self-heal Daemon on localhost               N/A       N/A        Y       13714
Self-heal Daemon on 10.76.153.213           N/A       N/A        Y       11303
Self-heal Daemon on 10.76.153.206           N/A       N/A        Y       21576

Task Status of Volume glustervol3
------------------------------------------------------------------------------
There are no active volume tasks

[root at gfs-3new glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
gluster volume heal $i; done
Launching heal operation to perform index self heal on volume glustervol1 has
been successful 
Use heal info commands to check status.
Launching heal operation to perform index self heal on volume glustervol2 has
been successful 
Use heal info commands to check status.
Launching heal operation to perform index self heal on volume glustervol3 has
been successful 
Use heal info commands to check status.
[root at gfs-3new glusterfs]#

Regards,
Amgad

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list