[Gluster-users] Newbie: Exploring Gluster for large-scale deployment in AWS, large media files, high performance I/O

Tue Jul 14 18:49:59 UTC 2015

I'm exploring solutions to help us achieve high throughput and 
scalability within the AWS environment.   Specifically, I work in a 
department where we handle and produce video content that results in 
very large files (30GB etc) that must be written to NFS, chopped up and 
copied over on the same mount (there are some odd limits to the code we 
use, but that's outside the scope of this question).

Currently, we're using a commercial vendor with AWS, with dedicated 
Direct Connect instances as the back end to our production.   We're 
maxing out at 350 to 380 MB/s which is not enough.  We expect our 
capacity will double or even triple when we bring on more classes or 
even other entities and we need to find a way to squeeze out as much I/O 
as we can.

Our software model depends on NFS, there's no way around that presently.

Since GlusterFS uses FUSE, I'm concerned about performance, which is a 
key issue.   Sounds like a STRIPE would be appropriate.

My basic understanding of Gluster is the ability to include several 
"bricks" which could be multiples of either dedicated EBS volumes or 
even multiple instances of the above commercial vendor, served up via 
NFS namespace, which would be transparently a single namespace to client 
connections.   The I/O could be distributed in this manner.

I wonder if someone here with more experience with the above might 
elaborate on whether GlusterFS could be used in the above scenario. 
Specifically, performance I/O.  We'd really like to gain upwards as much 
as possible, like 700 Mb/s and 1 GB/s and up if possible.

Thanks in advance.