[Gluster-users] some thoughts please on setting up a software archive based on glusterfs

Keith Freedman freedman at FreeFormIT.com
Tue Jul 29 21:29:59 UTC 2008

At 12:21 PM 7/29/2008, webmaster at securitywonks.org wrote:
>happy to hear you posted in "Who's using Gluster", all the best :) please
>let me know, once your experiment results are stable and all manual things
>are automated :)

our goal is to have around November, a fully (or mostly) self 
managing clustered cpanel installation (running over gluster)

hopefully that timeframe will work.

>if I have to add multiple gluster clients, how to do? whether the only way
>is to use multiple dedicated servers for the cause?
>or is it economical to setup multiple VPS on a physical server and use for
>multiple gluster clients ? (I am just trying to make it economical if
>possible, while trying to gain some extra performance). just trying to
>think in different methods to do this economically, what do you say sir?

I do understand the love affair with VPS's and virtual machines, 
however, they dont usually solve performance issues, and generally 
result in reduced performance.

Lets take your scenario:
You use memcached to cache database info.  This uses ram.  You also 
will want to use local disk to cache gluster files.

If you take a server with 8GB of ram, and a 500GB drive, you now can 
cache 400GB+ of filesystem data and you can load up multiple 
memcached processes (I think each one can address 2GB of ram?)  so in 
a single machine you can cache 6-7GB of DB stuff in memory (Also, the 
OS will use whatever extra ram is has to cache).

Now, split that up into 4 virtual machines.
You now have, per machine, 100MB of disk cache, and less than 2GB of 
memory for caching (more likely 1 or 1.5).

So, in your case, running one larger instance is going to provide 
MUCH better performance than splitting your resources.

The only advantage to virtualizing in this manner is if you're also 
partitioning your data, and then you might want different virtual 
machines doing different things, so you can optimize each unit for 
each particular function.

For example... you might have one webserver instance serving small 
image files, another serving large software package files, and 
possibly another for the database.  This way you can allocate more 
ram to the DB instance, and more disk for caching to the file server 
instances.  Possibly more for the image server and less for the large 
file server (since that's likely to get more cache misses anyway, 
just focus on optimizing the gluster config for the large files and 
allow more caching space for the small ones.

But simply virtualizing your hardware and cloning your config will 
have a negative impact on performance overall.

>need to think about Alu translator, once again then, thanks for your input
>on this,

I'm sure those with more familiarity with the translator can give 
better advice, but as I understand things, it may not help in your 
particular situation.

>what is the hardware configuration, it will be helpful, to know, share the
>configuration details if you like, we will be glad to know

one server pair are athalon 64 uniprocessors with 2 gb ram each.
another pair are slightly less speedy processors with 1gb ram each.
Admitedly my configuration isn't the most powerful, but it works just 
fine, and I get reasonable performance out of them.

I don't load 1000's of hosts onto my cpanel servers, as I'm not in 
the comodity web sale business.  I'd imagine for those customers 
memory and possibly cpu would need to be much improved.

>but if we do cache on all gluster clients, end of the day, I doubt, these
>may become like regular file servers know, any updates may not only
>stimulate synchronisation between glusterfs servers, but also updation of
>gluster client cache know, please share your thoughts and observations
>further, thank you

the clients behave the same when caching as the servers with afr, as 
far as I know...
when a file is requested, the gluster client asks the server for it's 
files version and timestamp, it compares it with its own.  if it's 
copy is the same (or newer, I presume), it serves from the local disk 
cache, if not, it fetches from the server and updates its cache.

If you have an environment where your files are updated constantly, 
you wont benefit from the cache, but as you describe your 
environment, I'd imagine mostly things are loaded once and left alone.
You're serving software downloads--the software doesn't change 
frequently.  You'll simply add more, right?

>what is your recommended configuration of gluster clients?

I think if they're simply clients with a web server, you would 
benefit more from larger disk (if you'll benefit from caching) and 
wouldn't need as much memory.

>memcached needs code changes, it will be helpful, try it, it will support
>the cause, many big sites use it

yes.. I'm not in control of most of the code my clients run, so I 
haven't bothered with it.

>thank you my friend, will you notify me, when your cpanel setup is ready
>with automation?

I'll send you a note, but again, it likely wont be until after October.


More information about the Gluster-users mailing list