[Bugs] [Bug 1687051] gluster volume heal failed when online upgrading from 3.12 to 5.x and when rolling back online upgrade from 4.1.4 to 3.12.15

bugzilla at redhat.com bugzilla at redhat.com
Thu Mar 21 15:22:27 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1687051



--- Comment #37 from Amgad <amgad.saleh at nokia.com> ---
(In reply to Amgad from comment #36)
> Thanks Sanju and Shyam.
> 
> I went ahead and built the 5.5 RPMS and re-did the online upgrade/rollback
> tests from 3.12.15 to 5.5, and back. I got the same issue with online
> rollback.
> Here is the data (logs are attached as well):
> 
> Case 1) online upgrade from 3.12.15 to 5.5 - upgrades stared right after:
> Thu Mar 21 14:01:06 UTC 2019
> ==========================================
> A) I have same cluster of 3 replicas: gfs-1 (10.76.153.206), gfs-2
> (10.76.153.213), and gfs-3new (10.76.153.207), running 3.12.15. 
> When online upgraded gfs-1 from 3.12.15 to 5.5, all bricks were online and
> heal succeeded. Continuing with gfs-2, then gfs-3new, online upgrade, heal
> succeeded as well.
> 
> 1) Here's the output after gfs-1 was online upgraded from 3.12.15 to 5.5:
> Logs uploaded are: gfs-1_gfs1_upg_log.tgz, gfs-2_gfs1_upg_log.tgz, and
> gfs-3new_gfs1_upg_log.tgz.
> 
> All volumes/bricks are online and heal succeeded.
> 
> [root at gfs-1 ansible2]# gluster volume status
> Status of volume: glustervol1
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data1/1            49155     0          Y      
> 19559
> Brick 10.76.153.213:/mnt/data1/1            49152     0          Y      
> 11171
> Brick 10.76.153.207:/mnt/data1/1            49152     0          Y      
> 25740
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 19587
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11161
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 25730
>  
> Task Status of Volume glustervol1
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol2
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data2/2            49156     0          Y      
> 19568
> Brick 10.76.153.213:/mnt/data2/2            49153     0          Y      
> 11180
> Brick 10.76.153.207:/mnt/data2/2            49153     0          Y      
> 25749
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 19587
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11161
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 25730
>  
> Task Status of Volume glustervol2
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol3
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data3/3            49157     0          Y      
> 19578
> Brick 10.76.153.213:/mnt/data3/3            49154     0          Y      
> 11189
> Brick 10.76.153.207:/mnt/data3/3            49154     0          Y      
> 25758
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 19587
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 25730
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11161
>  
> Task Status of Volume glustervol3
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root at gfs-1 ansible2]# for i in glustervol1 glustervol2 glustervol3; do
> gluster volume heal $i; done
> Launching heal operation to perform index self heal on volume glustervol1
> has been successful 
> Use heal info commands to check status.
> Launching heal operation to perform index self heal on volume glustervol2
> has been successful 
> Use heal info commands to check status.
> Launching heal operation to perform index self heal on volume glustervol3
> has been successful 
> Use heal info commands to check status.
> 
> Case 2) online rollback from 5.5 to 3.12.15 - upgrades stared right after:
> Thu Mar 21 14:20:01 UTC 2019
> ===========================================
> A) Here're the outputs after gfs-1 was online rolled back from 5.5 to
> 3.12.15 - rollback succeeded. All bricks were online, but "gluster volume
> heal" was unsuccessful:
> Logs uploaded are: gfs-1_gfs1_rollbk_log.tgz, gfs-2_gfs1_rollbk_log.tgz, and
> gfs-3new_gfs1_rollbk_log.tgz
> 
> 
> [root at gfs-1 glusterfs]# gluster volume status
> Status of volume: glustervol1
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data1/1            49152     0          Y      
> 21586
> Brick 10.76.153.213:/mnt/data1/1            49155     0          Y      
> 9772 
> Brick 10.76.153.207:/mnt/data1/1            49155     0          Y      
> 12139
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 9799 
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol1
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol2
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data2/2            49153     0          Y      
> 21595
> Brick 10.76.153.213:/mnt/data2/2            49156     0          Y      
> 9781 
> Brick 10.76.153.207:/mnt/data2/2            49156     0          Y      
> 12148
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 9799 
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol2
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol3
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data3/3            49154     0          Y      
> 21604
> Brick 10.76.153.213:/mnt/data3/3            49157     0          Y      
> 9790 
> Brick 10.76.153.207:/mnt/data3/3            49157     0          Y      
> 12157
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 9799 
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol3
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root at gfs-1 glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
> gluster volume heal $i; done
> Launching heal operation to perform index self heal on volume glustervol1
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> Commit failed on 10.76.153.213. Please check log file for details.
> Launching heal operation to perform index self heal on volume glustervol2
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> Commit failed on 10.76.153.213. Please check log file for details.
> Launching heal operation to perform index self heal on volume glustervol3
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> Commit failed on 10.76.153.213. Please check log file for details.
> [root at gfs-1 glusterfs]# 
> 
> B) Same "heal" failure after rolling back gfs-2 from 5.5 to 3.12.15
> ===================================================================
> 
> [root at gfs-2 glusterfs]#  gluster volume status
> Status of volume: glustervol1
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data1/1            49152     0          Y      
> 21586
> Brick 10.76.153.213:/mnt/data1/1            49152     0          Y      
> 11313
> Brick 10.76.153.207:/mnt/data1/1            49155     0          Y      
> 12139
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 11303
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol1
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol2
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data2/2            49153     0          Y      
> 21595
> Brick 10.76.153.213:/mnt/data2/2            49153     0          Y      
> 11322
> Brick 10.76.153.207:/mnt/data2/2            49156     0          Y      
> 12148
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 11303
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol2
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol3
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data3/3            49154     0          Y      
> 21604
> Brick 10.76.153.213:/mnt/data3/3            49154     0          Y      
> 11331
> Brick 10.76.153.207:/mnt/data3/3            49157     0          Y      
> 12157
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 11303
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.207           N/A       N/A        Y      
> 12166
>  
> Task Status of Volume glustervol3
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root at gfs-2 glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
> gluster volume heal $i; done
> Launching heal operation to perform index self heal on volume glustervol1
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> Launching heal operation to perform index self heal on volume glustervol2
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> Launching heal operation to perform index self heal on volume glustervol3
> has been unsuccessful:
> Commit failed on 10.76.153.207. Please check log file for details.
> [root at gfs-2 glusterfs]# 
> 
> C) After rolling back gfs-3new from 5.5 to 3.12.15 (all are on 3.12.15 now)
> heal succeeded
> Logs uploaded are: gfs-1_all_rollbk_log.tgz, gfs-2_all_rollbk_log.tgz, and
> gfs-3new_all_rollbk_log.tgz
> 
> [root at gfs-3new glusterfs]# gluster volume status
> Status of volume: glustervol1
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data1/1            49152     0          Y      
> 21586
> Brick 10.76.153.213:/mnt/data1/1            49152     0          Y      
> 11313
> Brick 10.76.153.207:/mnt/data1/1            49152     0          Y      
> 13724
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 13714
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11303
>  
> Task Status of Volume glustervol1
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol2
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data2/2            49153     0          Y      
> 21595
> Brick 10.76.153.213:/mnt/data2/2            49153     0          Y      
> 11322
> Brick 10.76.153.207:/mnt/data2/2            49153     0          Y      
> 13733
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 13714
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11303
>  
> Task Status of Volume glustervol2
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> Status of volume: glustervol3
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> -----------------------------------------------------------------------------
> -
> Brick 10.76.153.206:/mnt/data3/3            49154     0          Y      
> 21604
> Brick 10.76.153.213:/mnt/data3/3            49154     0          Y      
> 11331
> Brick 10.76.153.207:/mnt/data3/3            49154     0          Y      
> 13742
> Self-heal Daemon on localhost               N/A       N/A        Y      
> 13714
> Self-heal Daemon on 10.76.153.213           N/A       N/A        Y      
> 11303
> Self-heal Daemon on 10.76.153.206           N/A       N/A        Y      
> 21576
>  
> Task Status of Volume glustervol3
> -----------------------------------------------------------------------------
> -
> There are no active volume tasks
>  
> [root at gfs-3new glusterfs]# for i in glustervol1 glustervol2 glustervol3; do
> gluster volume heal $i; done
> Launching heal operation to perform index self heal on volume glustervol1
> has been successful 
> Use heal info commands to check status.
> Launching heal operation to perform index self heal on volume glustervol2
> has been successful 
> Use heal info commands to check status.
> Launching heal operation to perform index self heal on volume glustervol3
> has been successful 
> Use heal info commands to check status.
> [root at gfs-3new glusterfs]#
> 
> Regards,
> Amgad

comment seems to be duplicated

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list