[Gluster-users] some thoughts please on setting up a software archive based on glusterfs

Tue Jul 29 03:51:03 UTC 2008

Dear Keith

> At 09:54 AM 7/28/2008, webmaster at securitywonks.org wrote:
>>hopefully, if we define number of copies in AFR, will it take care of
>>things and do replications?.
>
> see the AFR examples on the wiki, but basically for each subvolume
> listed in the cluster/afr translator, there will be a copy.
> So, if you list 3 servers, there will be 3 copies..  if you list 8
> there will be 8 copies.
> However, be aware, AFR does NOT do ACTIVE repairing.... this means,
> if server 3 is down for a period of time, and files change on servers
> 1 and 2, server 3 will be out of sync until those files are
> accessed.  At this point, the AFR translator will notice server 3 is
> out of sync and will update the files on it.
> here's the downside:
> lets assume you have only a 2 server AFR setup.
> Server 1 goes down..  files updated on server 2.  then server 1 comes up.
> those files are not accessed so server 1 doesn't get fresh copies.
> now server 2 goes down.
> when you go to access those initial files they'll be accessed from
> server 1 and will be the older version.
> This is where multiple mirrors comes in handy.  if you have 3 copies,
> the likelihood of having this situation goes down.
> also, one of the AFR wiki articles discusses a find command which
> will stimulate the self-heal feature to bring the replica's back in sync.
>

I am thinking on to start with 2 gluster servers, anyhow, if possible in
the first step, I will consider 3 gluster servers as well for more
redundancy, have to see, how effective I can get this implemented in the
first time.

just another request: please also tell me, if we can stimulate FIND
command more regularly to keep all glusterfs servers in sync almost all
the time.

>>one more thing is, I find RAID5 or RAID6 or RAID10 or RAID60 is required.
>>I also read a statement that, either AFR needs to be enabled or we need
>> to
>>use RAID levels to have data redundancy.
>
> I dont think the gluster dev plan to bring this level of raid in a
> single translator.
> you can sort of simulate raid 0+1, but not any higher raid levels.
>
> I believe, what you'd do to get raid 0+1 is to set up the stripe
> translator before the AFR translator.
> So, you might stripe across server 1,2,3   and another stripe across
> server 4,5,6.
> then AFR stripe123  and stripe 456
>
> Honestly, I wouldn't risk this..  Unless your files are HUGE the
> performance gain wont be worth the risk in my opinion.
>

the file sizes of the files that I am going to host on ourwebsite range
from few Kilo Bytes to few hundred Mega Bytes (even upto 700 MB and
sometimes DVD files as well). Now, how do you suggest sir?

>>which one you recommend?
>>
>>what is the minimum number of copies we can make using AFR for added
>>redundancy? (I read Google stores 3 copies of it's data for added
>>redundancy, can we follow that rule and keep 3 copies using AFR?) or keep
>>some more copies?
>
> more is always better.  if you can afford it, store 10.  it has to do
> with how many servers you want to manage, how much disk space you
> want to buy, etc...
>
>>then, what are your thoughts about RAID levels sir?
>>
>>is RAID1 ok inthe above situation or, alternatively, keeping economics is
>>mind, if we go with multiple AFR copies, can we proceed. Please share
>> your
>>thoughts on this, thank you
>
> Daniel may have a different opinion, but those are my thoughts for
> you to consider.
>

I am thinking on to use single hard drive like 500GB SATA per glusterFS
server with 2 to 4GB RAM each. Thinking about Virtual systems as well, I
mean, how it will be if we host gluster servers as individual virtual
containers on gogrid.com , amazon utility hosting or some other service as
well. That is one thought I am considering, even though for now, I am
towards dedicated servers mainly.

can we use CPANEL/Direct Admin as control panel on glusterfs servers? I
mean, will glusterfs work on control panel based servers?

>> >> I read in some document that FTP, SSH can be used for uploading files
>> >> to GlusterFS based system.
>> >
>> > A Gluster client process simply uses Fuse to create the mountpoint.
>> > Once the mountpoint exists, it can be accessed just like any other
>> > directory in the filesystem, thus any normal way of creating,
>> > modifying, or deleting files is usable.  Basically anything that can
>> > interact with filesystem objects can interact with a Gluster
>> mountpoint
>> > (FTP and SCP included).
>>
>>I read about fuse before, from whose website, I came to know about these
>>Userspace file systems. please tell me, if we have to use one Gluster
>>CLient per server? or how do you count that?
>
> as far as I know..   the gluster server can serve volumes from a
> single .vol file
> I think it's not recommended to access multiple different volumes
> from a single server, but my guess is that it might work.. I presume
> Daniel will correct me if I'm wrong.
>
> You can have a separate client process running on a system.  OR you
> can have a single client/server process.
>
> the client vol file when used to mount a filesystem uses the last
> configured volume as the source of the mount.
> In this regard, it seems that the primary use is that any given
> machine can be a single server and/or a single client.  but you can't
> have one machine which acts as multiple gluster servers.
>
> So, in your situation, if you want to have 3 mirrors.. you need 3
> machines running as gluster servers.
>

I hope to hear more thoughts on this (single Gluster client accessing
multiple Glusterfs servers)

>>my initial plan to start this website is to use one dedicated server (for
>>web server, mysql server purpose), I wish to use the same as gluster
>>client as well from which I will initiate http file download requests.
>>Likewise, I think, I need to use the same server to upload files to the
>>glusterfs based storage servers.
>
> This is similar to the configuration I have.
> I have 2 machines.  I'm using the AFR translator to mirror the data
> across them.
> the AFR volume is mounted as /home
> I then have apache virtual hosts all in /home
> for MySQL, you would not want to put your mysql database files on top
> of gluster.
> use MySQL replication.  it does require some attention, but you
> really really really do NOT want to try to run multiple mysql
> instances on top of shared db files.
>
I had observed different HA solutions like mysql replication, drbd setup,
cluster, other commercial mysql High Availability options too, not able to
decide which way to go. In one point, I felt interested to try to use
HYPERTABLE (http://www.hypertable.org ) hosted on glusterFS, but as it is
young and as I donot have further info about it's php api and similar
reasons, I currently stick to MySQL only. Since I wish to use Memcache, I
am starting with one dedicated server for webserver and database server
together along with gluster client.

please tell me, can we use ALU (least connections method) and round robin
translators together or we need to use only one translator?

which translators you generally use?

I currently use round robin method for routing dns requests shared by my
download servers.

i wish to know, which one will be more effective when we go with GlusterFS
servers.

I also like to know about Geographical replication setup using this AFR
method. For example, if I place two GlusterFS servers in one datacenter,
two glusterfs servers in another datacenter, can we use the same AFR setup
for content replication effectively? and use geographical check in php and
try to route user download request to nearest datacenter (having our
glusterfs servers) using my single gluster client?

just some more thoughts: here, which translator will be used (either alu
or round robin or both depends on configuration setup and our main http
download request will be to the Gluster Client, which selects particular
GlusterFS server (based on backend configuration) and deliver the software
file know?

how it will be, if we host multiple gluster clients in multiple servers,
inwhich situation, if we use round robin method to input "file download
request" to a gluster client among the list of gluster clients from which,
based on the default selected translator (ALU : least connections method)
for example, glusterfs server is selected and file delivered accordingly,
what do you say sir, will this method work?

>>I wish to use 2 glusterfs storage servers initially and grow them as
>> along
>>the site growth.
>
> This is my plan also.
> Once I get a pair working and stabilized, adding a third server
> should be fairly trivial.
>
> Since AFR does active self healing, it's possible to set up a server
> with an empty filesystem, add it to the AFR volumes, and it will copy
> data over from the other server(s) as it's requested.
>
>>please share your thoughts sir.
>>
>> >
>> >> I am currently trying to find if there is any other documentation
>> that
>> >> clarifies this situation. Also, more info on how we will construct a
>> >> url to the hosted files using http protocol, also will they be
>> >> accessible directly or with a password etc lot of questions.
>> >
>> > http://httpd.apache.org/docs/2.2/
>>
>>thanks for confirmation for this as well, so, you mean, we construct file
>>headers etc as normal as before in the same way. I read in a doc that,
>>clients authentication occur either with pre-defined list of IP addresses
>>(glusterfs clients) or by using pre-defined list of username/password
>>combinations. hopefully, we have a better way of using it, thank you
>
> I think you're asking 2 different questions.
> Configure apache as you normally would.. just make sure the
> filesystem the apache virtualhosts are using is within the gluster mount
> point.
>

may be, I need to get this point done correctly (apache virtual hosts
using gluster mount point correctly)

> your followup question is related to the gluster server configuration
> and there's lots of info in the wiki about that.
> I use the IP based auth.  Mostly because this is webserver data and
> if someone spoofs the IP and somehow grabs the gluster stream,
> they're only going to get data they could get by using a web browser
> for the most part, so I'm not overly concerned about that level of
> security.
>

I am ok with "IP based Auth" the only worry I have is about "hotlinking",
other than that, I am fine ok.

>> >
>> >> can we use php file system functions directly to deal with files
>> >> hosted on glusterfs based system?
>> >
>> > Yes.
>>
>>  I am relaxed a bit better after your confirmation in writing that I can
>>use php file system functions, ftp and scp, ssh functions the same way as
>>before even with glusterfs file system.
>
> once mounted, a gluster filesystem is the same as any other
> filesystem.  So think of it as you would any other filesystem.
>
> your applications (apache, php, etc.) will be none the wiser.
>
> My advice would be to contact the zresearch folks (you can find them
> via www.gluster.com) and find out what their rates are for
> professional services.
> Given your knowledge level, it would probably be helpful to hire
> someone to help you get past your first configuration, after which
> you should be able to plug along just fine.
> (you can contact me for implementation consulting also, but since
> your issues are mostly gluster related, it's sometimes best to go
> straight to the source)

what I mean is to discuss the different doubts and once finalised, try to
write them together and ask for a quote for initial implementation. Just I
am trying to get answers to my newbie questions for clear linkup and how
communication occur between web server, gluster client, glusterfs servers
etc all info, when my request to a consultant can be meaningful, I mean,
they can more perfectly understand what I require, I hope.

thanks you guys, both Daniel and you keith for your valuable thoughts. I
wish to get some more clarity on other points that I had mentioned above,

thank you guys :)

With Best Regards
Raghu Veer

> Keith
> p.s.  If it wasn't clear, I'm just a gluster user, not a developer,
> so my opinions are form an operational perspective.
>
>