[Gluster-users] Remove brick on 3.7.11 distributed-disperse volume doesn't rebalance

Thu May 26 17:09:54 UTC 2016

Hi,

I have a distributed-disperse 7 x (2 + 1) volume that I want
to remove three bricks from:

   Volume Name: glance
   Type: Distributed-Disperse
   Volume ID: 34a962cc-be73-480e-a9f7-8dbd9c7ca066
   Status: Started
   Number of Bricks: 7 x (2 + 1) = 21 
   Transport-type: tcp
   Bricks:
   Brick1: block8:/dpool/gluster/brick1/glance
   Brick2: block15:/dpool/gluster/brick1/glance
   Brick3: dxl3:/dpool/gluster/brick1/glance
   Brick4: block1:/dpool/gluster/brick1/glance
   Brick5: block9:/dpool/gluster/brick1/glance
   Brick6: block16:/dpool/gluster/brick1/glance
   Brick7: block2:/dpool/gluster/brick1/glance
   Brick8: block10:/dpool/gluster/brick1/glance
   Brick9: block20:/dpool/gluster/brick1/glance
   Brick10: block3:/dpool/gluster/brick1/glance
   Brick11: block11:/dpool/gluster/brick1/glance
   Brick12: block21:/dpool/gluster/brick1/glance
   Brick13: dxl1:/dpool/gluster/brick1/glance
   Brick14: block12:/dpool/gluster/brick1/glance
   Brick15: block17:/dpool/gluster/brick1/glance
   Brick16: dxl2:/dpool/gluster/brick1/glance
   Brick17: block13:/dpool/gluster/brick1/glance
   Brick18: block22:/dpool/gluster/brick1/glance
   Brick19: block14:/dpool/gluster/brick1/glance
   Brick20: block18:/dpool/gluster/brick1/glance
   Brick21: block23:/dpool/gluster/brick1/glance
   Options Reconfigured:
   cluster.min-free-disk: 200GB
   performance.readdir-ahead: on

In this case, I've chosen dxl2, block13, and block22
(which are part of the same disperse set).

   # gluster volume remove-brick glance block22:/dpool/gluster/brick1/glance block13:/dpool/gluster/brick1/glance dxl2:/dpool/gluster/brick1/glance start
   # gluster volume remove-brick glance block22:/dpool/gluster/brick1/glance block13:/dpool/gluster/brick1/glance dxl2:/dpool/gluster/brick1/glance status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                                    dxl2                0        0Bytes             0             0             0            completed        0:0:1
                                 block13                0        0Bytes             0             0             0            completed        0:0:1

Before starting this, I created 4096 random files in a directory.
565 of them are found in /dpool/gluster/brick1/glance on each of those
bricks and nowhere else.

After running the remove-brick the files stayed on those systems
and didn't appear anywhere else, despite the logs showing that
a rebalance occurred:

   [dxl2] # tail -100 /var/log/glusterfs/glance-rebalance.log
   [2016-05-26 16:25:35.409904] I [MSGID: 109028] [dht-rebalance.c:3831:gf_defrag_status_get] 0-glance-dht: Rebalance is completed. Time taken is 1.00 secs

Attempts to manually rebalance failed because the remove-brick task 
hadn't yet been committed.

If I commit the remove-brick, the system switches to a 6 x (2 + 1) 
configuration which is what I wanted, but all 565 of the files that 
were on the now-removed disperse set are no longer available anywhere.

Is this a bug?  Or am I misunderstanding how to remove a disperse
set from a distributed-disperse volume?  I'm running 3.7.11 on 
CentOS 7.

Any guidance would be greatly appreciated.

Thanks,

Chris