[Gluster-devel] Architecture advice

Thu Jan 8 15:13:26 UTC 2009

On Jan 8, 2009, at 5:23 AM, Joe Landman wrote:

> Dan Parsons wrote:
>> Now that I'm upgrading to gluster 1.4/2.0, I'm going to take the time
>> and rearchitect things.
>> Hardware: Gluster servers: 4 blades connected via 4gbit fc to fast,
>> dedicated storage. Each server has two bonded Gig-E links to the rest
>> of my network, for 8gbit/s theoretical throughput.
>
> Just make sure the channel bonded gigabits are a) not broadcom  
> based, b) not using anything other than mode 0 (long story, but me  
> offline if you want to hear some horror stories of hard to fix  
> crashes).  If you have access to 10 GbE/IB, these would be superior  
> solutions (entire system ... storage and clients).
>

I've actually been using this stripe + bonding solution for 8 months  
with no problems (that are related to those two technologies), even  
though I'm using broadcom chips (forced to by what's on our blades).  
Not only that, but the blade chassis switch is a Dell  
Powerconnect.......... I would have much preferred a Cisco but the  
replacement cost is kind of high and I did eventually make the  
powerconnect do the bonding properly. Using mode 802.3ad, which I  
believe is also known as mode 4. As I said, it's worked great for 8  
months. With iptraf I can saturate the 2gbit/s link with very little  
impact on system performance, and irq generation is reasonable too.  
I'd rather have Intel but I can't say the Broadcom stuff is causing  
any problems for me.

>> Gluster clients: 33 blades each with one, gig-e connection. They use
>> local storage for OS and gluster for input/output files.
>> Specific questions: (1) There are many times, in our workflow, when
>> more than a few nodes will want the same file at the same time. This
>> made me want to use the stripe xlator. In this way, when a client
>> node saturates its gig-e link reading the file, each gluster server
>> is using only 250mbit/s, leaving room for more clients. If I wasn't
>> using stripe, this hypothetical file would be on just one server
>> node, and it would get slammed if more than two client nodes talked
>> to it. Is there a better way of doing this? Did I make the correct
>> decision in using stripe xlator for this purpose? Can I achieve the
>> same thing using just afr?
>
> Without spending money to fix the storage architecture, you really  
> will need to look at afr, as stripe may help on single requests more  
> than multiple (guessing).  You should be able to benchmark/test  
> this, but I would imagine that AFR would help you with multiple  
> simultaneous read/only access to specific files.  Read/write will be  
> more complex.
>
> If you can spend money to fix the storage architecture, 10GbE or IB  
> everywhere (storage nodes, client nodes, ...).  You won't regret it.
>

stripe has worked flawlessly for me, helping enormously when 33 nodes  
each want the same pile of multi-gigabyte files at the same time. I  
asked the question I did to find out if there was a better way of  
doing this with gluster 2.0. The upgrade cost for IB was almost twice  
my department's annual budget. And I believe from what Krishna said,  
afr-based load balancing does not yet exist in 2.0. Again, no problems  
with this setup, just making sure I'm doing it the best way (with  
regards to gluster).

>> (2) I would like to architect the system such that if one node goes
>> down, the others can keep serving the data, even if overall
>> throughput is less. This means that all data would need to be
>> accessible from all clients. Is this something I would use afr xlator
>> for? If so, do I even need stripe anymore, to handle my need to have
>
> Server side AFR.  Stripe may not help the reliability here.

ZOMG - please point me at docs on how to set this up. stripe on top of  
AFR sounds nice but I do believe that will make all my client nodes  
have to do more work, right? So having the servers handle AFR would be  
beautiful.

>
>
>> multiple servers capable of sending different chunks of the same
>> file? And how does the HA xlator play into this?
>> We have a mix of (small quantity of gigantic files) and (extremely
>> gigantic quantity of small files), so I'm sure there will need to be
>> some parameter tuning.
>> Thanks in advance. If this question would be better addressed under
>> some sort of support agreement, please let me know.
>> Dan Parsons
>> ------------------------------------------------------------------------
>> _______________________________________________ Gluster-devel mailing
>> list Gluster-devel at nongnu.org http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

Finally, the only remaining "issue" I have with gluster is the memory  
leak in the io-cache translator, which I'm told is fixed in 2.0, which  
is why I'm upgrading.

>
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
>       http://jackrabbit.scalableinformatics.com
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>