[Gluster-users] How to correctly distribute OpenStack VM files...

Mon Aug 5 13:01:57 UTC 2013

On 08/02/2013 06:22 AM, Xavier Trilla wrote:
>
> Hi,
>
> We have been playing for a while with GlusterFS (Now with ver 3.4). We 
> are running tests and playing with it to check if GlusterFS  can be 
> really used as the distributed storage for OpenStack block storage 
> (Cinder) as new features in KVM, GlusterFS and OpenStack are pointing 
> to GlusterFS as the future of OpenStack open source block and object 
> storage.
>
> But we've found a problem just when we started playing with 
> GlusterFS... The way distribute translator (DHT) balances the load. I 
> mean, we understand and see the benefits of metadata less setup. Using 
> hashes based on filenames and assigning a hash range to each brick is 
> clever, reliable and fast, but from our understanding there is a big 
> problem when it comes to storing VM images of a OpenStack deployment.
>
> I mean, OpenStack Block Storage (Cinder) assigns a name to each volume 
> it creates (GUID), so GlusterFS does a hash of the filename and 
> decides in which brick it should be stored. But as in this scenario we 
> don't have many files (I mean, we would just have one big file per VM) 
> we may end with a really unbalanced storage.
>
> Let's say we have a 4 bricks setup with DHT distribute, and we want to 
> store 100 VMs there, so the ideal scenario would be:
>
> Brick1: 25 VMs
>
> Brick2: 25 VMs
>
> Brick3: 25 VMs
>
> Brick4: 25 VMs
>
> As VMs are IO intensive it's really important to correctly balance the 
> load, as each brick has a limited amount of IOPS, but as DHT is just 
> based on a filename HASH, we could end with something like the 
> following scenario (Or even worse):
>
> Brick1: 50 VMs
>
> Brick2: 10 VMs
>
> Brick3: 35 VMs
>
> Brick4: 5 VMs
>
> And if we scale this out, things may get even worse. I mean, we may 
> end with almost all VM file in one or two bricks and all the other 
> bricks almost empty. And if we use growing VM disk image files like 
> qcow2 the option "min-free-disk" will not prevent all VMs disk image 
> files being stored in the same brick. So, I understand DHT works well 
> for large amount of small files, but for few big IO intensive files 
> doesn't seem to be a really good solution... (I mean, we are looking 
> for a solution able to handle around 32 bricks and around 1500 VM for 
> the initial deployment and able to scale up to 256 bricks and 12000 
> VMs :/ )
>
> So, anybody has a suggestion about how to handle this? I mean so far 
> we only see two options: Either using legacy unify translator with ALU 
> scheduler or either use cluster/stripe translator with a big 
> block-size so at least load gets balanced across all bricks in some 
> way.  But obviously we don't like unify as it needs a namespace brick, 
> and using stripping seems to have an impact on performance and really 
> complicates backup/restore/recovery strategies.
>
>

Another suggestion that you may want to try is, have your GlusterFS node 
also serve as OpenStack Cinder and use NUFA[1]

~shanks

[1] 
http://gluster.org/community/documentation/index.php/Translators/cluster/nufa 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130805/98ae88e0/attachment.html>