[Gluster-users] Extra work in gluster volume rebalance and odd reporting

Fri Oct 25 21:29:10 UTC 2013

A couple more things:

1.  For the work volume, the failures are caused by hard links that
can't be rebalanced.  It is odd thought that the hardlinks show up in
the rebalanced files count even though they failed.

2. Output of gluster volume info:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 83fa39a6-6e68-4e1c-8fae-3c3e30b1bd66
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ir0:/lhome/gluster_home
Brick2: ir1:/lhome/gluster_home
Brick3: ir2:/lhome/gluster_home
Brick4: ir3:/raid/gluster_home
Options Reconfigured:
cluster.lookup-unhashed: no
performance.client-io-threads: on
performance.cache-size: 512MB
server.statedump-path: /tmp

Volume Name: work
Type: Distribute
Volume ID: 823816bb-2e60-4b37-a142-ba464a77bfdc
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: ir0:/raid/gluster_work
Brick2: ir1:/raid/gluster_work
Brick3: ir2:/raid/gluster_work
Options Reconfigured:
performance.client-io-threads: on
performance.cache-size: 1GB
performance.write-behind-window-size: 3MB
performance.flush-behind: on
server.statedump-path: /tmp

Thanks,

Joel

On Fri, Oct 25, 2013 at 11:49 AM, Joel Young <jdy at cryregarder.com> wrote:
> Folks,
>
> With gluster 1.4.0 on fedora 19:
>
> I have a four node gluster peer group (ir0, ir1, ir2, ir3).  I've got
> two distributed filesystems on the cluster.
>
> One (work) distributed with bricks on ir0, ir1, and ir2.  The other
> (home) replicated and distributed with replication across the
> distribution pairs (ir0, ir3) and (ir1, ir2).
>
> When doing a gluster volume rebalance home start and gluster volume
> rebalance work start, it does rebalance operations on every node in
> the peer group.  For work, it ran a rebalance on ir3 even though there
> is no brick on ir3.  For home, it ran a rebalance on ir1 and ir3 and
> did no work on those.
>
> [root at ir0]# gluster volume rebalance home status; gluster volume
> rebalance work status
>                                     Node Rebalanced-files
> size       scanned      failures         status run time in secs
>                                ---------      -----------
> -----------   -----------   -----------   ------------
> --------------
>                                localhost            33441
> 2.3GB        120090             0    in progress         67154.00
>                                      ir2            12878
> 32.7GB        234395             0      completed         29569.00
>                                      ir3                0
> 0Bytes        234367             0      completed          1581.00
>                                      ir1                0
> 0Bytes        234367             0      completed          1569.00
> volume rebalance: home: success:
>                                     Node Rebalanced-files
> size       scanned      failures         status run time in secs
>                                ---------      -----------
> -----------   -----------   -----------   ------------
> --------------
>                                localhost                0
> 0Bytes       1862936             0      completed          4444.00
>                                      ir2              417
> 10.4GB       1862936           417      completed          4466.00
>                                      ir3                0
> 0Bytes       1862936             0      completed          4454.00
>                                      ir1                4
> 282.8MB       1862936             4      completed          4438.00
>
>
> Sometimes I would get:
>
> volume rebalance: work: success:
> [root at ir0 ghenders]# gluster volume rebalance home status; gluster
> volume rebalance work status
>                                     Node Rebalanced-files
> size       scanned      failures         status run time in secs
>                                ---------      -----------
> -----------   -----------   -----------   ------------
> --------------
>                                localhost            31466
> 2.3GB        114290             0    in progress         63194.00
>                                localhost            31466
> 2.3GB        114290             0    in progress         63194.00
>                                localhost            31466
> 2.3GB        114290             0    in progress         63194.00
>                                localhost            31466
> 2.3GB        114290             0    in progress         63194.00
>                                      ir3                0
> 0Bytes        234367             0      completed          1581.00
> volume rebalance: home: success:
>                                     Node Rebalanced-files
> size       scanned      failures         status run time in secs
>                                ---------      -----------
> -----------   -----------   -----------   ------------
> --------------
>                                localhost                0
> 0Bytes       1862936             0      completed          4444.00
>                                localhost                0
> 0Bytes       1862936             0      completed          4444.00
>                                localhost                0
> 0Bytes       1862936             0      completed          4444.00
>                                localhost                0
> 0Bytes       1862936             0      completed          4444.00
>                                      ir1                4
> 282.8MB       1862936             4      completed          4438.00
>
>
> Where it only reports progress on one node.
>
> Should I file bugs on these?
>
> Joel