[Gluster-users] Newbie: Exploring Gluster for large-scale deployment in AWS, large media files, high performance I/O

Tue Jul 14 20:18:00 UTC 2015

by NFS i think you just mean "all servers seeing and changing sames files"
? That can be done with fuse, without nfs.
NFS is harder for failover while automatic with fuse (no need for dynamic
dns or virtual IP).

for redundancy I mean : What failure do you want to survive ?

   - Loosing a disk
   - Filesystem corrupt
   - Server lost or in maintenance
   - Whole region down

Depending on your needs, then you may have to replicate data accross
gluster brick or even use a geo dispersed brick.

Will network between your servers and node be able to handle that traffic
(380MB/s = 3040Mb/s) ?

I guess gluster can handle that load, you are using big files and this is
where gluster deliver highest output. Nevertheless, you will need many disk
to provide these i/o, even more if using replicated bricks.

Cordialement,
Mathieu CHATEAU
http://www.lotp.fr

2015-07-14 21:15 GMT+02:00 Forrest Aldrich <forrie at gmail.com>:

>  Sorry, I should have noted that.  380MB is both read and write (I
> confirmed this with a developer).
>
> We do need the NFS stack, as that's how all the code and various many
> Instances work -- we have several "workers" that chop up video on the same
> namespace.  It's not efficient, but that's how it has to be for now.
>
> Redundancy, in terms of the server?   We have RAIDED volumes if that's
> what you're referring to.
>
> Here's a basic outline of the flow (as I understand it):
>
>
> Video Capture Agent sends in large file of video (30gb +/-)
>
> Administrative host receives and writes to NFS
>
> A process copies this over to another point in the namespace
>
> Another Instance picks up the file, reads and starts processing and writes
> (FFMPEG is involved)
>
>
> Something like that -- I may not have all the steps, but essentially
> there's a ton of I/O going on.   I know our code model is not efficient,
> but it's complicated and can't just be changed (it's based on an open
> source product and there's some code baggage).
>
> We looked into another product that allegedly scaled out using multiple
> NFS heads with massive local cache (AWS instances) and sharing the same
> space, but it was horrible and just didn't work for us.
>
>
>
> Thank you.
>
>
>
>
> On 7/14/15 3:06 PM, Mathieu Chateau wrote:
>
> Hello,
>
>  is it 380MB in read or write ? What level of redundancy do you need?
> do you really need nfs stack or just a mount point (and so be able to use
> native gluster protocol) ?
>
>  Gluster load is mostly put on clients, not server (clients do the sync
> writes to all replica, and do the memory cache)
>
>
>  Cordialement,
> Mathieu CHATEAU
> http://www.lotp.fr
>
> 2015-07-14 20:49 GMT+02:00 Forrest Aldrich <forrie at gmail.com>:
>
>> I'm exploring solutions to help us achieve high throughput and
>> scalability within the AWS environment.   Specifically, I work in a
>> department where we handle and produce video content that results in very
>> large files (30GB etc) that must be written to NFS, chopped up and copied
>> over on the same mount (there are some odd limits to the code we use, but
>> that's outside the scope of this question).
>>
>> Currently, we're using a commercial vendor with AWS, with dedicated
>> Direct Connect instances as the back end to our production.   We're maxing
>> out at 350 to 380 MB/s which is not enough.  We expect our capacity will
>> double or even triple when we bring on more classes or even other entities
>> and we need to find a way to squeeze out as much I/O as we can.
>>
>> Our software model depends on NFS, there's no way around that presently.
>>
>> Since GlusterFS uses FUSE, I'm concerned about performance, which is a
>> key issue.   Sounds like a STRIPE would be appropriate.
>>
>> My basic understanding of Gluster is the ability to include several
>> "bricks" which could be multiples of either dedicated EBS volumes or even
>> multiple instances of the above commercial vendor, served up via NFS
>> namespace, which would be transparently a single namespace to client
>> connections.   The I/O could be distributed in this manner.
>>
>> I wonder if someone here with more experience with the above might
>> elaborate on whether GlusterFS could be used in the above scenario.
>> Specifically, performance I/O.  We'd really like to gain upwards as much as
>> possible, like 700 Mb/s and 1 GB/s and up if possible.
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150714/4145a97d/attachment.html>