[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
freedman at FreeFormIT.com
Tue Jul 29 21:29:59 UTC 2008
At 12:21 PM 7/29/2008, webmaster at securitywonks.org wrote:
>happy to hear you posted in "Who's using Gluster", all the best :) please
>let me know, once your experiment results are stable and all manual things
>are automated :)
our goal is to have around November, a fully (or mostly) self
managing clustered cpanel installation (running over gluster)
hopefully that timeframe will work.
>if I have to add multiple gluster clients, how to do? whether the only way
>is to use multiple dedicated servers for the cause?
>or is it economical to setup multiple VPS on a physical server and use for
>multiple gluster clients ? (I am just trying to make it economical if
>possible, while trying to gain some extra performance). just trying to
>think in different methods to do this economically, what do you say sir?
I do understand the love affair with VPS's and virtual machines,
however, they dont usually solve performance issues, and generally
result in reduced performance.
Lets take your scenario:
You use memcached to cache database info. This uses ram. You also
will want to use local disk to cache gluster files.
If you take a server with 8GB of ram, and a 500GB drive, you now can
cache 400GB+ of filesystem data and you can load up multiple
memcached processes (I think each one can address 2GB of ram?) so in
a single machine you can cache 6-7GB of DB stuff in memory (Also, the
OS will use whatever extra ram is has to cache).
Now, split that up into 4 virtual machines.
You now have, per machine, 100MB of disk cache, and less than 2GB of
memory for caching (more likely 1 or 1.5).
So, in your case, running one larger instance is going to provide
MUCH better performance than splitting your resources.
The only advantage to virtualizing in this manner is if you're also
partitioning your data, and then you might want different virtual
machines doing different things, so you can optimize each unit for
each particular function.
For example... you might have one webserver instance serving small
image files, another serving large software package files, and
possibly another for the database. This way you can allocate more
ram to the DB instance, and more disk for caching to the file server
instances. Possibly more for the image server and less for the large
file server (since that's likely to get more cache misses anyway,
just focus on optimizing the gluster config for the large files and
allow more caching space for the small ones.
But simply virtualizing your hardware and cloning your config will
have a negative impact on performance overall.
>need to think about Alu translator, once again then, thanks for your input
I'm sure those with more familiarity with the translator can give
better advice, but as I understand things, it may not help in your
>what is the hardware configuration, it will be helpful, to know, share the
>configuration details if you like, we will be glad to know
one server pair are athalon 64 uniprocessors with 2 gb ram each.
another pair are slightly less speedy processors with 1gb ram each.
Admitedly my configuration isn't the most powerful, but it works just
fine, and I get reasonable performance out of them.
I don't load 1000's of hosts onto my cpanel servers, as I'm not in
the comodity web sale business. I'd imagine for those customers
memory and possibly cpu would need to be much improved.
>but if we do cache on all gluster clients, end of the day, I doubt, these
>may become like regular file servers know, any updates may not only
>stimulate synchronisation between glusterfs servers, but also updation of
>gluster client cache know, please share your thoughts and observations
>further, thank you
the clients behave the same when caching as the servers with afr, as
far as I know...
when a file is requested, the gluster client asks the server for it's
files version and timestamp, it compares it with its own. if it's
copy is the same (or newer, I presume), it serves from the local disk
cache, if not, it fetches from the server and updates its cache.
If you have an environment where your files are updated constantly,
you wont benefit from the cache, but as you describe your
environment, I'd imagine mostly things are loaded once and left alone.
You're serving software downloads--the software doesn't change
frequently. You'll simply add more, right?
>what is your recommended configuration of gluster clients?
I think if they're simply clients with a web server, you would
benefit more from larger disk (if you'll benefit from caching) and
wouldn't need as much memory.
>memcached needs code changes, it will be helpful, try it, it will support
>the cause, many big sites use it
yes.. I'm not in control of most of the code my clients run, so I
haven't bothered with it.
>thank you my friend, will you notify me, when your cpanel setup is ready
I'll send you a note, but again, it likely wont be until after October.
More information about the Gluster-users