[Bugs] [Bug 1600145] [geo-rep]: Worker still ACTIVE after killing bricks

Fri Jul 13 12:56:25 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1600145

Kotresh HR <khiremat at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |khiremat at redhat.com

--- Comment #2 from Kotresh HR <khiremat at redhat.com> ---

Description of problem:
=======================
The ACTIVE brick processes for a geo-replication session were killed but it
remains ACTIVE even after going down.

Before the bricks were killed:
-----------------------------
[root at dhcp42-18 scripts]# gluster volume geo-replication master
10.70.43.116::slave status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
     SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:09:32          
10.70.42.18     master        /rhs/brick2/b4    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:06:17          
10.70.42.18     master        /rhs/brick3/b7    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:06:17          
10.70.41.239    master        /rhs/brick1/b2    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.41.239    master        /rhs/brick2/b5    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.41.239    master        /rhs/brick3/b8    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick1/b3    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick2/b6    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick3/b9    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        
[root at dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892

Task Status of Volume gluster_shared_storage
-----------------------------------------------------------------------------

After the bricks were killed using gf_attach:
---------------------------------------------
[root at dhcp42-18 scripts]# gluster v status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.41.239:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       28814
Brick 10.70.43.179:/var/lib/glusterd/ss_bri
ck                                          49152     0          Y       27173
Brick dhcp42-18.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       9969 
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892

Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: master
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.18:/rhs/brick1/b1            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick1/b2           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick1/b3           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick2/b4            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick2/b5           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick2/b6           49152     0          Y       27173
Brick 10.70.42.18:/rhs/brick3/b7            N/A       N/A        N       N/A  
Brick 10.70.41.239:/rhs/brick3/b8           49152     0          Y       28814
Brick 10.70.43.179:/rhs/brick3/b9           49152     0          Y       27173
Self-heal Daemon on localhost               N/A       N/A        Y       10879
Self-heal Daemon on 10.70.41.239            N/A       N/A        Y       29525
Self-heal Daemon on 10.70.43.179            N/A       N/A        Y       27892

Task Status of Volume master
------------------------------------------------------------------------------
There are no active volume tasks

[root at dhcp42-18 scripts]# gluster volume geo-replication master
10.70.43.116::slave status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
     SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:11:33          
10.70.42.18     master        /rhs/brick2/b4    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:12:02          
10.70.42.18     master        /rhs/brick3/b7    root         
10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-10
01:12:18          
10.70.41.239    master        /rhs/brick1/b2    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.41.239    master        /rhs/brick2/b5    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.41.239    master        /rhs/brick3/b8    root         
10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick1/b3    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick2/b6    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        
10.70.43.179    master        /rhs/brick3/b9    root         
10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A        

Version-Release number of selected component (if applicable):
=============================================================
mainline

How reproducible:
=================
2/2

Steps to Reproduce:
1.Create a geo-replication session (3x3 master and slave volume)
2.Mount the master and slave volume
3.Create files on the master
4.kill brick using gf_attach 

Actual results:
===============
The workers still remain ACTIVE

Expected results:
================
The 3 ACTIVE workers should go to FAULTY and 3 PASSIVE workers should become
ACTIVE and do the syncing

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=HUYlt4KNNs&a=cc_unsubscribe