[Gluster-users] How glusterFs deal data traffic when it do 'rebalance layout and migrate'?

Fri Feb 7 08:10:27 UTC 2014

I have read architecture documents about GlusterFS, Ceph and Swift. Mostly
I can't understand and worry about GlusterFS is rebalacing. From my
understanding, It looks like that GlusterFS move too much data when it do
'rebalance layout and migrate', especially compare with solution of Ceph
and Swift. I'm not sure that i understand it correctly.

It look like Ceph and Swift map file to disk with similar way. They map
file to disk like this.

filename->hashfile name-> map to PG(placement group at ceph) or
partition(at swif)-> PG or partition to disk.

And if i did not misunderstand, GlusterFs do this way.

filename->hashfile name->map to disk

Existence of PG(or parition) makes only minimul data migration from old
disks to new disks. That way they flatten storage cluster with minimul data
migration. Theoretically, worst case scenario, ceph and swift moves amount
of data which is equal to newly added disk size(at this point, I assumed
that there only exist homogeneous files)

But GlusterFs, from my understand, just add new disk to tail of hash ring.
And I found a article which is wrote by jeff darcy. He also said that there
really exist this issue.

http://he <http://hekafs.org/index.php/author/Jeff%20Darcy/>
kafs.org/index.php/2012/03/glusterfs-algorithms-distribution/

If storage cluster get larger and larger 50% data replacement will be more
and more horrible issue. But could not find worries about this issue. How
do they works in real field? I'm worrying about that there will be too much
data traffic while 'rebalancing and migration'. Maybe just rebalance the
layout and forget about migration can be solution. In this way, exchange of
little latency maybe relieve migration traffic and time. But this looks a
patchwork solution. Moreover,without migration and flattening, Write
requests will converge upon newly added disk.

Am i understand correctly about glusterFs? Above articles about glusterFs
is quite old and I couldn't find any docuemnts about architecture except
it. If my understanding is correct, How glusterFs user solve those kind of
issue(huge migration data traffic, converge of write request)? Are there
some design decision or target usecase?

Thanks in advances for any helps.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140207/f1be0da8/attachment.html>