[Gluster-users] some thoughts please on setting up a software archive based on glusterfs

Keith Freedman freedman at FreeFormIT.com
Tue Jul 29 11:03:16 UTC 2008

At 08:51 PM 7/28/2008, webmaster at securitywonks.org wrote:
>Dear Keith
>I am thinking on to start with 2 gluster servers, anyhow, if possible in
>the first step, I will consider 3 gluster servers as well for more
>redundancy, have to see, how effective I can get this implemented in the
>first time.
>just another request: please also tell me, if we can stimulate FIND
>command more regularly to keep all glusterfs servers in sync almost all
>the time.

Well, I'm not sure it's necessary..  you technically "could" run it 
via cron, however, realize it's pretty IO intensive.  each file 
access causes gluster to look at the underlying filesystem, AND ask 
each of the other servers for the xattr's (versions) of the 
file.  so, if you do this often over a very large filesystem, I'm 
guessing it'll have a negative impact on performance.

You really only need to stimulate auto-healing if somethings gone 
wrong.  If it's this important to you, perhaps you could write a 
script to watch the gluster log for server disconnects..  if a server 
experiences one.. then run the find, otherwise, no need.
And then if a server is down, after it's back up, run the find.

Otherwise, there shouldn't be a need to do this.

> > Honestly, I wouldn't risk this..  Unless your files are HUGE the
> > performance gain wont be worth the risk in my opinion.
> >
>the file sizes of the files that I am going to host on ourwebsite range
>from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and
>sometimes DVD files as well). Now, how do you suggest sir?

you may want to experiment with the stripling and the .. I forgot 
what it's called, but the volume which allows you to specify what 
filetypes go on what volume.   You can create a AFR'ed stripe volume 
for the .mpeg/.vob files and have the rest of the files using normal 
non-stripped AFR, however, if you're mostly reading these files, you 
might be better off with just a normal AFR and maybe add a caching volume.
Hopefully someone else reading can give you better advice on this subject.

>I am thinking on to use single hard drive like 500GB SATA per glusterFS
>server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I
>mean, how it will be if we host gluster servers as individual virtual
>containers on gogrid.com , amazon utility hosting or some other service as
>well. That is one thought I am considering, even though for now, I am
>towards dedicated servers mainly.

I'm not familiar with the gogrid.com offering, so I can't speak to that.
you could likely use any utility hosting just have to figure out what 
works best for your situation... honestly, I think once you get 
things configured it's very low maintenance, and in the long run will 
be cheaper to run your own setup.

>can we use CPANEL/Direct Admin as control panel on glusterfs servers? I
>mean, will glusterfs work on control panel based servers?

I use CPanel.
I'm working on building a multi-server cpanel package.
Right now, I have the user homedirectories on a gluster filesystem, I 
use unison to sync certain cpanel files.
Presently, I have to copy the httpd.conf config (changing IP's) to 
the other server, along with new  password,group,shadow entries when 
new accounts are created.
I then have to add the other server IP address to their DNS record.

This works pretty well for me and I have a load-balanced (via round 
robin DNS) cpanel setup.

The goal is to automate all the processes I do manually, and then 
I'll have a situation where I can have one cpanel installation and 
scale it across an infinite number of servers.

I assume it will work similarly with plesk or any other control panel.

I've also set php.ini to use a temp folder on the gluster filesyatem, 
instead of /tmp so that user sessions are shared amongst the 
machines--this way if the browser bounces to the other server, the 
user's session doesn't disappear.

>I hope to hear more thoughts on this (single Gluster client accessing
>multiple Glusterfs servers)

in my cpanel configuration, I have 2 servers..  each AFR to 
eachother.  I set local read volume to the local disk.

It would work jsut as well to have multiple servers and a single (or 
multiple) cpanel client(s).

>I had observed different HA solutions like mysql replication, drbd setup,
>cluster, other commercial mysql High Availability options too, not able to
>decide which way to go. In one point, I felt interested to try to use
>HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is
>young and as I donot have further info about it's php api and similar
>reasons, I currently stick to MySQL only. Since I wish to use Memcache, I
>am starting with one dedicated server for webserver and database server
>together along with gluster client.

I would shy away from any database using shared storage.  I'm not 
sure they're mature enough, and I think there may be unpleasant 
performance issues related to the speed of the locking mechanism.

If you're really worried, you could run mysql cluster, however, as 
far as I know, this is still an In Memory database, which wont give 
you much space for you database.

I can't say my mysql replication setup is trouble free, but it's 
pretty dependable.
I have some scripts which monitor the slave status and notify me if 
the replication breaks, I check and often it's just a matter of 
skipping one statement and restarting the slave process.
There have been cases where I had to copy certain database tables 
from one machine to the other.

I'm not sure what affect it'll have on performance, but you may be 
able to run mysql over gluster only to have a kind of live/hot backup 
of the database... but I'm not sure how it'll work in practice, and 
it wont protect against data corruption.

>please tell me, can we use ALU (least connections method) and round robin
>translators together or we need to use only one translator?

I'm not sure.. hopefully one of the dev's can shed light.
I *blieve* you can intermix the translators almost any way you 
want... but I'm not sure.

>which translators you generally use?

in my configuration, I use posix locks, io-threads, AFR (with local 
read volume).
I'm not using any of the other performance related translators, as I 
just dont understand them well enough to know if I'll get benefit 
from them in my configuration.

>I currently use round robin method for routing dns requests shared by my
>download servers.

this is how my cpanel servers are set up.

>i wish to know, which one will be more effective when we go with GlusterFS

round robin is easiest.  as for effective, probalby springing for a 
real load balancer which has load monitoring daemons on the 
webservers will get you the best results, but It really depends on 
your situation.
if you're processing data which can be cpu intensive, then this is 
the best option, if you're just serving normal web pages, then 
round-robin is fine.
if you're streaming mpegs, then you'll want to be able to balance 
over network load or disk i/o.

however, in any of those cases, I'd start with round robin and if you 
find it's insufficient, then spend the money an insert a load 
balancer of some sort.

>I also like to know about Geographical replication setup using this AFR
>method. For example, if I place two GlusterFS servers in one datacenter,
>two glusterfs servers in another datacenter, can we use the same AFR setup
>for content replication effectively? and use geographical check in php and
>try to route user download request to nearest datacenter (having our
>glusterfs servers) using my single gluster client?

Currently, for each pair of serves. each server is in a different 
datacenter.  Currently both datacenters are in the same city, 
however, my ultimate plan is to move the servers to different geographic zones.
The only concern I have here, would be how network latency affects gluster.
My suspicion is that it's going to be just fine in my case, since 
there aren't a lot of file updates so gluster wont have to do much 
more chattering than the AFR auto-heal checking it normally does.

>just some more thoughts: here, which translator will be used (either alu
>or round robin or both depends on configuration setup and our main http
>download request will be to the Gluster Client, which selects particular
>GlusterFS server (based on backend configuration) and deliver the software
>file know?
>how it will be, if we host multiple gluster clients in multiple servers,
>inwhich situation, if we use round robin method to input "file download
>request" to a gluster client among the list of gluster clients from which,
>based on the default selected translator (ALU : least connections method)
>for example, glusterfs server is selected and file delivered accordingly,
>what do you say sir, will this method work?

I'm not sure how to answer.. again,I'd recommend you hire the 
gluster.com folks to help with your implementation design, however...
I would think you could just AFR 2-3 servers, on the clients, round 
robin is fine.  add some disk to the clients so you can use the 
caching translator to speed up subsequent requests, and you should be 
doing alright.

>may be, I need to get this point done correctly (apache virtual hosts
>using gluster mount point correctly)

all my apache virtual hosts point to /home/USER/public_html
/home is my gluster mountpoint.

the other files which cpanel uses for user info I sync periodically 
through UNISON via cron.

>I am ok with "IP based Auth" the only worry I have is about "hotlinking",
>other than that, I am fine ok.

hotlink protection is handled by apache.  I wouldn't worry about 
someone trying to "mount" your gluster filesystem by spoofing the IP.

a future solution would be to add in an encryption translator later 
when one is available.

(I'd love to see a compression translator, but new filesystems do 
this for you (zfs), so maybe it doesn't need to happen at the gluster level)

>what I mean is to discuss the different doubts and once finalised, try to
>write them together and ask for a quote for initial implementation. Just I
>am trying to get answers to my newbie questions for clear linkup and how
>communication occur between web server, gluster client, glusterfs servers
>etc all info, when my request to a consultant can be meaningful, I mean,
>they can more perfectly understand what I require, I hope.

good plan

>thanks you guys, both Daniel and you keith for your valuable thoughts. I
>wish to get some more clarity on other points that I had mentioned above,
>thank you guys :)
>With Best Regards
>Raghu Veer

very welcome.

More information about the Gluster-users mailing list