[Gluster-users] some thoughts please on setting up a software archive based on glusterfs
Keith Freedman
freedman at FreeFormIT.com
Tue Jul 29 11:03:16 UTC 2008
At 08:51 PM 7/28/2008, webmaster at securitywonks.org wrote:
>Dear Keith
>
>I am thinking on to start with 2 gluster servers, anyhow, if possible in
>the first step, I will consider 3 gluster servers as well for more
>redundancy, have to see, how effective I can get this implemented in the
>first time.
>
>just another request: please also tell me, if we can stimulate FIND
>command more regularly to keep all glusterfs servers in sync almost all
>the time.
Well, I'm not sure it's necessary.. you technically "could" run it
via cron, however, realize it's pretty IO intensive. each file
access causes gluster to look at the underlying filesystem, AND ask
each of the other servers for the xattr's (versions) of the
file. so, if you do this often over a very large filesystem, I'm
guessing it'll have a negative impact on performance.
You really only need to stimulate auto-healing if somethings gone
wrong. If it's this important to you, perhaps you could write a
script to watch the gluster log for server disconnects.. if a server
experiences one.. then run the find, otherwise, no need.
And then if a server is down, after it's back up, run the find.
Otherwise, there shouldn't be a need to do this.
> > Honestly, I wouldn't risk this.. Unless your files are HUGE the
> > performance gain wont be worth the risk in my opinion.
> >
>
>the file sizes of the files that I am going to host on ourwebsite range
>from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and
>sometimes DVD files as well). Now, how do you suggest sir?
you may want to experiment with the stripling and the .. I forgot
what it's called, but the volume which allows you to specify what
filetypes go on what volume. You can create a AFR'ed stripe volume
for the .mpeg/.vob files and have the rest of the files using normal
non-stripped AFR, however, if you're mostly reading these files, you
might be better off with just a normal AFR and maybe add a caching volume.
Hopefully someone else reading can give you better advice on this subject.
>I am thinking on to use single hard drive like 500GB SATA per glusterFS
>server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I
>mean, how it will be if we host gluster servers as individual virtual
>containers on gogrid.com , amazon utility hosting or some other service as
>well. That is one thought I am considering, even though for now, I am
>towards dedicated servers mainly.
I'm not familiar with the gogrid.com offering, so I can't speak to that.
you could likely use any utility hosting just have to figure out what
works best for your situation... honestly, I think once you get
things configured it's very low maintenance, and in the long run will
be cheaper to run your own setup.
>can we use CPANEL/Direct Admin as control panel on glusterfs servers? I
>mean, will glusterfs work on control panel based servers?
I use CPanel.
I'm working on building a multi-server cpanel package.
Right now, I have the user homedirectories on a gluster filesystem, I
use unison to sync certain cpanel files.
Presently, I have to copy the httpd.conf config (changing IP's) to
the other server, along with new password,group,shadow entries when
new accounts are created.
I then have to add the other server IP address to their DNS record.
This works pretty well for me and I have a load-balanced (via round
robin DNS) cpanel setup.
The goal is to automate all the processes I do manually, and then
I'll have a situation where I can have one cpanel installation and
scale it across an infinite number of servers.
I assume it will work similarly with plesk or any other control panel.
I've also set php.ini to use a temp folder on the gluster filesyatem,
instead of /tmp so that user sessions are shared amongst the
machines--this way if the browser bounces to the other server, the
user's session doesn't disappear.
>I hope to hear more thoughts on this (single Gluster client accessing
>multiple Glusterfs servers)
in my cpanel configuration, I have 2 servers.. each AFR to
eachother. I set local read volume to the local disk.
It would work jsut as well to have multiple servers and a single (or
multiple) cpanel client(s).
>I had observed different HA solutions like mysql replication, drbd setup,
>cluster, other commercial mysql High Availability options too, not able to
>decide which way to go. In one point, I felt interested to try to use
>HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is
>young and as I donot have further info about it's php api and similar
>reasons, I currently stick to MySQL only. Since I wish to use Memcache, I
>am starting with one dedicated server for webserver and database server
>together along with gluster client.
I would shy away from any database using shared storage. I'm not
sure they're mature enough, and I think there may be unpleasant
performance issues related to the speed of the locking mechanism.
If you're really worried, you could run mysql cluster, however, as
far as I know, this is still an In Memory database, which wont give
you much space for you database.
I can't say my mysql replication setup is trouble free, but it's
pretty dependable.
I have some scripts which monitor the slave status and notify me if
the replication breaks, I check and often it's just a matter of
skipping one statement and restarting the slave process.
There have been cases where I had to copy certain database tables
from one machine to the other.
I'm not sure what affect it'll have on performance, but you may be
able to run mysql over gluster only to have a kind of live/hot backup
of the database... but I'm not sure how it'll work in practice, and
it wont protect against data corruption.
>please tell me, can we use ALU (least connections method) and round robin
>translators together or we need to use only one translator?
I'm not sure.. hopefully one of the dev's can shed light.
I *blieve* you can intermix the translators almost any way you
want... but I'm not sure.
>which translators you generally use?
in my configuration, I use posix locks, io-threads, AFR (with local
read volume).
I'm not using any of the other performance related translators, as I
just dont understand them well enough to know if I'll get benefit
from them in my configuration.
>I currently use round robin method for routing dns requests shared by my
>download servers.
this is how my cpanel servers are set up.
>i wish to know, which one will be more effective when we go with GlusterFS
>servers.
round robin is easiest. as for effective, probalby springing for a
real load balancer which has load monitoring daemons on the
webservers will get you the best results, but It really depends on
your situation.
if you're processing data which can be cpu intensive, then this is
the best option, if you're just serving normal web pages, then
round-robin is fine.
if you're streaming mpegs, then you'll want to be able to balance
over network load or disk i/o.
however, in any of those cases, I'd start with round robin and if you
find it's insufficient, then spend the money an insert a load
balancer of some sort.
>I also like to know about Geographical replication setup using this AFR
>method. For example, if I place two GlusterFS servers in one datacenter,
>two glusterfs servers in another datacenter, can we use the same AFR setup
>for content replication effectively? and use geographical check in php and
>try to route user download request to nearest datacenter (having our
>glusterfs servers) using my single gluster client?
Currently, for each pair of serves. each server is in a different
datacenter. Currently both datacenters are in the same city,
however, my ultimate plan is to move the servers to different geographic zones.
The only concern I have here, would be how network latency affects gluster.
My suspicion is that it's going to be just fine in my case, since
there aren't a lot of file updates so gluster wont have to do much
more chattering than the AFR auto-heal checking it normally does.
>just some more thoughts: here, which translator will be used (either alu
>or round robin or both depends on configuration setup and our main http
>download request will be to the Gluster Client, which selects particular
>GlusterFS server (based on backend configuration) and deliver the software
>file know?
>
>how it will be, if we host multiple gluster clients in multiple servers,
>inwhich situation, if we use round robin method to input "file download
>request" to a gluster client among the list of gluster clients from which,
>based on the default selected translator (ALU : least connections method)
>for example, glusterfs server is selected and file delivered accordingly,
>what do you say sir, will this method work?
I'm not sure how to answer.. again,I'd recommend you hire the
gluster.com folks to help with your implementation design, however...
I would think you could just AFR 2-3 servers, on the clients, round
robin is fine. add some disk to the clients so you can use the
caching translator to speed up subsequent requests, and you should be
doing alright.
>may be, I need to get this point done correctly (apache virtual hosts
>using gluster mount point correctly)
all my apache virtual hosts point to /home/USER/public_html
/home is my gluster mountpoint.
the other files which cpanel uses for user info I sync periodically
through UNISON via cron.
>I am ok with "IP based Auth" the only worry I have is about "hotlinking",
>other than that, I am fine ok.
hotlink protection is handled by apache. I wouldn't worry about
someone trying to "mount" your gluster filesystem by spoofing the IP.
a future solution would be to add in an encryption translator later
when one is available.
(I'd love to see a compression translator, but new filesystems do
this for you (zfs), so maybe it doesn't need to happen at the gluster level)
>what I mean is to discuss the different doubts and once finalised, try to
>write them together and ask for a quote for initial implementation. Just I
>am trying to get answers to my newbie questions for clear linkup and how
>communication occur between web server, gluster client, glusterfs servers
>etc all info, when my request to a consultant can be meaningful, I mean,
>they can more perfectly understand what I require, I hope.
good plan
>thanks you guys, both Daniel and you keith for your valuable thoughts. I
>wish to get some more clarity on other points that I had mentioned above,
>
>thank you guys :)
>
>With Best Regards
>Raghu Veer
very welcome.
More information about the Gluster-users
mailing list