[Bugs] [Bug 1352771] New: [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart

Tue Jul 5 03:44:52 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1352771

            Bug ID: 1352771
           Summary: [DHT]: Rebalance info for remove brick  operation is
                    not showing after glusterd restart
           Product: GlusterFS
           Version: 3.8.0
         Component: distribute
          Keywords: ZStream
          Assignee: bugs at gluster.org
          Reporter: sabansal at redhat.com
                CC: amukherj at redhat.com, bsrirama at redhat.com,
                    bugs at gluster.org, byarlaga at redhat.com,
                    kramdoss at redhat.com, nbalacha at redhat.com,
                    sabansal at redhat.com, sasundar at redhat.com,
                    smohan at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1296796, 1351021

+++ This bug was initially created as a clone of Bug #1351021 +++

+++ This bug was initially created as a clone of Bug #1296796 +++

Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2),
mounted it as fuse  and started IO, during IO in progress, started remove brick
operation and restart glusterd on the node which is hosting the brick to
remove,
after glusterd restart there is not rebalance info displaying like
"Rebalanced-files,     size,       scanned" all the things it's showing as
zeros.

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14

How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the
rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart

Expected results:
=================
It should show Rebalance info even after glusterd restart.

Console log:
============

[root at dhcp42-84 ~]# gluster volume status
Status of volume: Dis
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/abc0       49272     0          Y       2916 
Brick 10.70.42.84:/bricks/brick1/abc1       49273     0          Y       2935 
Brick 10.70.43.35:/bricks/brick0/abc2       49155     0          Y       30032
NFS Server on localhost                     2049      0          Y       3804 
NFS Server on 10.70.43.35                   2049      0          Y       30324

Task Status of Volume Dis
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 start
volume remove-brick start: success
ID: b2e6507e-838f-4cc4-9061-aa7ba84d9b30
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35              102       411.8KB        
  275             0             0          in progress               4.00
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35              140       978.1KB        
  340             0             0          in progress               6.00
[root at dhcp42-84 ~]# 

Stop and Start GlusterD:
========================
[root at dhcp43-35 ~]# systemctl stop glusterd
[root at dhcp43-35 ~]# 
[root at dhcp43-35 ~]# 
[root at dhcp43-35 ~]# systemctl start  glusterd

[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
[root at dhcp42-84 ~]# 

[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               1.00
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               2.00
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               4.00

[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0            completed              13.00
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0            completed              13.00
[root at dhcp42-84 ~]# gluster volume remove-brick Dis
10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0            completed              13.00
[root at dhcp42-84 ~]#

[root at dhcp42-84 ~]# gluster volume info

Volume Name: Dis-Rep
Type: Distributed-Replicate
Volume ID: 69667c02-408f-41a9-b83e-c1684e69ef03
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.42.84:/bricks/brick0/sbr00
Brick2: 10.70.42.84:/bricks/brick1/sbr11
Brick3: 10.70.43.35:/bricks/brick0/sbr22
Brick4: 10.70.43.35:/bricks/brick1/sbr33
Options Reconfigured:
performance.readdir-ahead: on
[root at dhcp42-84 ~]# 

[root at dhcp42-84 ~]# gluster volume status
Status of volume: Dis-Rep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/sbr00      49282     0          Y       3129 
Brick 10.70.42.84:/bricks/brick1/sbr11      49283     0          Y       3148 
Brick 10.70.43.35:/bricks/brick0/sbr22      49165     0          Y       7257 
Brick 10.70.43.35:/bricks/brick1/sbr33      49166     0          Y       7276 
NFS Server on localhost                     2049      0          Y       3170 
Self-heal Daemon on localhost               N/A       N/A        Y       3175 
NFS Server on 10.70.43.35                   2049      0          Y       7298 
Self-heal Daemon on 10.70.43.35             N/A       N/A        Y       7303 

Task Status of Volume Dis-Rep
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume 
unrecognized command
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  start
volume remove-brick start: success
ID: 5ca18e2e-43c9-481f-ab5a-aae02240bb97
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35               50       335.1KB        
  200             0             0          in progress               4.00
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35              108       548.9KB        
  372             0             0          in progress               9.00
[root at dhcp42-84 ~]# 

<<<<<<<<<Stop and Start Glusterd>>>>>>>>>>

[root at dhcp43-35 ~]# systemctl stop glusterd
[root at dhcp43-35 ~]# 
[root at dhcp43-35 ~]# 
[root at dhcp43-35 ~]# systemctl start  glusterd
[root at dhcp43-35 ~]# 

<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>

[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               0.00
[root at dhcp42-84 ~]# 
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               0.00
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               0.00
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2
10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes        
    0             0             0          in progress               0.00
[root at dhcp42-84 ~]# 

Thanks

This bug exists for all types of volume. The issue is that only the rebalance
status is stored in the node_state.info file. On restarting glusterd it is
retrieved and displayed in the status. The other values like rebalance_files,
scanned_files etc are not stored in the node_state.info file and hence not
available for displaying in the status after restarting glusterd.

--- Additional comment from Vijay Bellur on 2016-06-29 03:25:15 EDT ---

REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all
rebalance related information) posted (#1) for review on master by Sakshi
Bansal

--- Additional comment from Vijay Bellur on 2016-07-04 04:17:12 EDT ---

REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all
rebalance related information) posted (#2) for review on master by Sakshi
Bansal

--- Additional comment from Vijay Bellur on 2016-07-04 08:35:00 EDT ---

COMMIT: http://review.gluster.org/14827 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit 0cd287189e5e9f876022a8c6481195bdc63ce5f8
Author: Sakshi Bansal <sabansal at redhat.com>
Date:   Wed Jun 29 12:09:06 2016 +0530

    glusterd: glusterd must store all rebalance related information

    Change-Id: I8404b864a405411e3af2fbee46ca20330e656045
    BUG: 1351021
    Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
    Reviewed-on: http://review.gluster.org/14827
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Atin Mukherjee <amukherj at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1296796
[Bug 1296796] [DHT]: Rebalance info for remove brick  operation is  not
showing after glusterd restart
https://bugzilla.redhat.com/show_bug.cgi?id=1351021
[Bug 1351021] [DHT]: Rebalance info for remove brick  operation is  not
showing after glusterd restart
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.