[Gluster-users] volume failing to heal after replace-brick

Fri Feb 6 03:37:46 UTC 2015

Solving my own issue here. gluster can be frustrating at times.

Documentation, bug tracker and mailing list all say that various
replace-brick functions are deprecated in 3.6 and self-heal daemon is
responsible for migrating data. the CLI even says tell you when you try
any command:

All replace-brick commands except commit force are deprecated. Do you
want to continue? (y/n)

Yet it seems replace-brick still works, and is infact the only way to
migrate data from one brick to another as SHD clearly does not work as
per my previous message.

Round 2 - aka ignoring everything everybody says and doing it the old way

$ gluster volume info test

Volume Name: test
Type: Replicate
Volume ID: 34716d2f-8fd1-40a4-bcab-ecfe3f6d116b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ds1:/export/test
Brick2: ds2:/export/test

$ gluster volume replace-brick test ds2:/export/test ds2:/export/test2
start force
All replace-brick commands except commit force are deprecated. Do you
want to continue? (y/n) y
volume replace-brick: success: replace-brick started successfully
ID: 6f8054cc-03b8-4a1e-afc5-2aecf70d909b

$ gluster volume replace-brick test ds2:/export/test ds2:/export/test2
status    
All replace-brick commands except commit force are deprecated. Do you
want to continue? (y/n) y
volume replace-brick: success: Number of files migrated = 161   
Migration complete

$ gluster volume replace-brick test ds2:/export/test ds2:/export/test2
commit
All replace-brick commands except commit force are deprecated. Do you
want to continue? (y/n) y
volume replace-brick: success: replace-brick commit successful

$ gluster volume info test

Volume Name: test
Type: Replicate
Volume ID: 34716d2f-8fd1-40a4-bcab-ecfe3f6d116b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ds1:/export/test
Brick2: ds2:/export/test2

fuse clients can now see all data as expected.

I can't begin to express how frustrating this experience has been.

On 06/02/15 10:51, Jordan Tomkinson wrote:
> Hi,
>
> Using Gluster 3.6.1, I'm trying to replace a brick but after issuing a
> volume heal nothing gets healed and my clients see an empty volume.
>
> I have reproduced this on a test volume, shown here.
>
> $ gluster volume status test
>
> Status of volume: test
> Gluster process                        Port    Online    Pid
> ------------------------------------------------------------------------------
> Brick ds1:/export/test            49153    Y    7093
> Brick ds2:/export/test            49154    Y    11472
> NFS Server on localhost                    2049    Y    11484
> Self-heal Daemon on localhost                N/A    Y    11491
> NFS Server on 10.42.0.207                2049    Y    7110
> Self-heal Daemon on 10.42.0.207                N/A    Y    7117
>  
> Task Status of Volume test
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> I then mount the volume from a client and store some files.
>
> Now I replace ds2:/export/test with an empty disk mounted on
> ds2:/export/test2
>
> $ gluster volume replace-brick test ds2:/export/test ds2:/export/test2
> commit force
> volume replace-brick: success: replace-brick commit successful
>
> At this point, doing an ls on the volume mounted from a fuse client
> shows an empty volume, basically the contents of the new empty brick.
>
> So i issue a volume heal full
>
> $ gluster volume heal test full
> Launching heal operation to perform full self heal on volume test has
> been successful
> Use heal info commands to check status
>
> $ gluster volume heal test info
> Gathering list of entries to be healed on volume test has been successful
>
> Brick ds1:/export/test
> Number of entries: 0
>
> Brick ds2:/export/test2
> Number of entries: 0
>
>
> Nothing gets healed from ds1:/export/test to ds2:/export/test2 and my
> clients still see an empty volume.
>
> I can see the data on ds1:/export/test if i look inside the brick
> directory, but nothing on ds2:/export/test2
>
> tailing glustershd.log, nothing is printed after running the heal command.
>
> the only log is one line from etc-glusterfs-glusterd.vol.log
> [2015-02-06 02:48:39.906224] I
> [glusterd-volume-ops.c:482:__glusterd_handle_cli_heal_volume]
> 0-management: Received heal vol req for volume test
>
> Any ideas?
>
>
>

-- 
Jordan Tomkinson
System Administrator
Catalyst IT Australia