[Gluster-users] How to correctly distribute OpenStack VM files...
Gowrishankar Rajaiyan
gsr at redhat.com
Mon Aug 5 13:01:57 UTC 2013
On 08/02/2013 06:22 AM, Xavier Trilla wrote:
>
> Hi,
>
> We have been playing for a while with GlusterFS (Now with ver 3.4). We
> are running tests and playing with it to check if GlusterFS can be
> really used as the distributed storage for OpenStack block storage
> (Cinder) as new features in KVM, GlusterFS and OpenStack are pointing
> to GlusterFS as the future of OpenStack open source block and object
> storage.
>
> But we've found a problem just when we started playing with
> GlusterFS... The way distribute translator (DHT) balances the load. I
> mean, we understand and see the benefits of metadata less setup. Using
> hashes based on filenames and assigning a hash range to each brick is
> clever, reliable and fast, but from our understanding there is a big
> problem when it comes to storing VM images of a OpenStack deployment.
>
> I mean, OpenStack Block Storage (Cinder) assigns a name to each volume
> it creates (GUID), so GlusterFS does a hash of the filename and
> decides in which brick it should be stored. But as in this scenario we
> don't have many files (I mean, we would just have one big file per VM)
> we may end with a really unbalanced storage.
>
> Let's say we have a 4 bricks setup with DHT distribute, and we want to
> store 100 VMs there, so the ideal scenario would be:
>
> Brick1: 25 VMs
>
> Brick2: 25 VMs
>
> Brick3: 25 VMs
>
> Brick4: 25 VMs
>
> As VMs are IO intensive it's really important to correctly balance the
> load, as each brick has a limited amount of IOPS, but as DHT is just
> based on a filename HASH, we could end with something like the
> following scenario (Or even worse):
>
> Brick1: 50 VMs
>
> Brick2: 10 VMs
>
> Brick3: 35 VMs
>
> Brick4: 5 VMs
>
> And if we scale this out, things may get even worse. I mean, we may
> end with almost all VM file in one or two bricks and all the other
> bricks almost empty. And if we use growing VM disk image files like
> qcow2 the option "min-free-disk" will not prevent all VMs disk image
> files being stored in the same brick. So, I understand DHT works well
> for large amount of small files, but for few big IO intensive files
> doesn't seem to be a really good solution... (I mean, we are looking
> for a solution able to handle around 32 bricks and around 1500 VM for
> the initial deployment and able to scale up to 256 bricks and 12000
> VMs :/ )
>
> So, anybody has a suggestion about how to handle this? I mean so far
> we only see two options: Either using legacy unify translator with ALU
> scheduler or either use cluster/stripe translator with a big
> block-size so at least load gets balanced across all bricks in some
> way. But obviously we don't like unify as it needs a namespace brick,
> and using stripping seems to have an impact on performance and really
> complicates backup/restore/recovery strategies.
>
>
Another suggestion that you may want to try is, have your GlusterFS node
also serve as OpenStack Cinder and use NUFA[1]
~shanks
[1]
http://gluster.org/community/documentation/index.php/Translators/cluster/nufa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130805/98ae88e0/attachment.html>
More information about the Gluster-users
mailing list