[Bugs] [Bug 1511271] Rebalance estimate(ETA) shows wrong details( as intial message of 10min wait reappears) when still in progress

Thu Nov 9 03:24:39 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1511271

--- Comment #1 from Nithya Balachandran <nbalacha at redhat.com> ---

Description of problem:
==============================
I did a removebrick operation to convert 2x2 to 1x2 , while IOs were going on
from 3 different ganesha mounts.

I noticed that at a later stage(may be >80% completed), the message of "The
estimated time for rebalance to complete will be unavailable for the first 10
minutes." appears again. 

I thinks this comes when the rebalance estimated time is over, but rebalance as
such is not yet completed 

Last login: Tue Aug  8 19:32:38 2017 from 10.70.35.77
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost             5145         7.4MB        
10594             0             0          in progress        0:06:38
                                 server2             4142        21.7MB        
 8722             0             0          in progress        0:06:38
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost             5993        31.3MB        
11970             0             0          in progress        0:08:38
                                 server2             5050        26.6MB        
10415             0             0          in progress        0:08:38
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success

[root at server1 ~]# gluster v rebal nrep2 status                                 
  Node Rebalanced-files          size       scanned      failures       skipped
              status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost             8059        62.0MB        
16022             0             0          in progress        0:13:13
                                 server2             7208        76.2MB        
14071             0             0          in progress        0:13:13
Estimated time left for rebalance to complete :        0:47:28
volume rebalance: nrep2: success
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            10699       110.9MB        
21188             0             0          in progress        0:19:58
                                 server2             9949       119.4MB        
16739             0             0          in progress        0:19:58
Estimated time left for rebalance to complete :        0:47:25
volume rebalance: nrep2: success

[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            16839       151.7MB        
28114             0             0          in progress        0:33:23
                                 server2            16754       184.3MB        
27528             0             0          in progress        0:33:23
Estimated time left for rebalance to complete :        0:00:48
volume rebalance: nrep2: success

[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            20687       192.2MB        
32058             0             0          in progress        0:39:16
                                 server2            20965       189.6MB        
32669             0             0          in progress        0:39:16
Estimated time left for rebalance to complete :        0:00:06
volume rebalance: nrep2: success

============== SEE FROM BELOW ==================

[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            21521       192.8MB        
33069             0             0          in progress        0:40:28
                                 server2            22456       189.6MB        
35708             0             0          in progress        0:40:28
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            21669       192.8MB        
33372             0             0          in progress        0:40:36
                                 server2            22614       189.6MB        
35708             0             0          in progress        0:40:36
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            21718       192.8MB        
33372             0             0          in progress        0:40:40
                                 server2           22667       189.6MB        
36020             0             0          in progress        0:40:40
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success
[root at server1 ~]# 
[root at server1 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            23842       194.1MB        
37488             0             0          in progress        0:43:47
                                 server2            23440       285.5MB        
39635             0             0            completed        0:43:29
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: nrep2: success

Version-Release number of selected component (if applicable):
[root at server1 ~]# rpm -qa|grep gluster
glusterfs-api-3.8.4-38.el7rhgs.x86_64
python-gluster-3.8.4-34.el7rhgs.noarch
glusterfs-server-3.8.4-38.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-3.8.4-38.el7rhgs.x86_64
glusterfs-cli-3.8.4-38.el7rhgs.x86_64
glusterfs-rdma-3.8.4-38.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
glusterfs-libs-3.8.4-38.el7rhgs.x86_64
glusterfs-fuse-3.8.4-38.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-38.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-38.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-38.el7rhgs.x86_64

Steps to Reproduce:
1.had a 1x2 volume add-brick to convert 2x2 and rebalance was done(with some
files skipped)
2.did linux untar from one client, lookups from another client(going on till
end)
rename,move,chmod,chgrp from another client , but for only sometime, that too
these operations were complete much before the rebalance was at this state.

3.observed rebalance eta 

Actual results:
==========
again eta starts to show the initial 10 min wait message

--- Additional comment from Worker Ant on 2017-11-07 00:32:20 EST ---

COMMIT: https://review.gluster.org/18000 committed in master by  

------------- cli: correct rebalance status elapsed check

Check that elapsed time has crossed 10 mins for at least
one rebalance process before displaying the estimates.

Change-Id: Ib357a6f0d0125a178e94ede1e31514fdc6ce3593
BUG: 1479528
Signed-off-by: N Balachandran <nbalacha at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.