[Gluster-users] Design Questions
Moritz Krinke
mkrinke at fotocommunity.net
Tue Jul 7 17:18:12 UTC 2009
Hello,
while doing some research on how to build a System which enables us to
easily rsync backups from other system on it, i came across GlusterFS,
and after several hours of reading, im quite impressed and im looking
forward to implement it ;)
Basicly i would like you to comment on the Design i put together, as
there are lots of different ways to do things, some might be prefered
over others.
We need:
15 TB Storage, at least stored in a Raid1 like fashion, Raid5/Raid6
would be preferable, but i think this is not possible with GlusterFS?!.
While reading the docs i realised i could use this system probably too
for hosting our images via HTTP, because of features like
- easy to expand with new storage / servers
- io-cache
- Lighttpd Plugin for direct FS access
This way we would not just gain a backup storage for our pictures,
which are currently served by a mogilefs/varnish/lighttpd cluster but
also a backup cluster which could serve files directly to our users.
( community-site with lots of pictures, file size varies, but most the
files are 50 to 300 KiloBytes, but planning on storing files with
~10MB too)
Great :-)
We've planned to use the following Hardware
5 Servers, each with:
- quad-core
- 16 GB Ram
- 4 x 1,5 TB HDD, no RAID
- dedicated GBit Ethernet Switched Network
GlusterFS Setup:
Same config on all nodes, each with
volume posix -> volume locks -> volume with io-threads -> volume
with write-behind -> volume with io-cache size of 14 GB (so 2GB is
left for the system)
for each of the 4 drives / mountpoints
then having config entries for all 20 bricks, using tcp as transport
type
then creating cluster/replicate-volumes with always 2 disks on
different servers,
and creating a cluster/nufa-volume having the 10 replicate-volumes
as subvolumes.
As i understand it, this should provide me with the following:
- data redundandy: if one disk fails, i can replace the disk and
GlusterFS automaticlly replicates all the lost data back to the new disk
AND/OR: the same if the whole server is lost/broken
- distributed access: a read access to a specific file will always
go to the same server/drive, regardless from which server it gets
requested, and will therefore be cached by the io-cache layer on
the specific node which has the file on disk- ok, a little
network-overhead, but thats better than putting the cache on top of
the distribute-volume which would result in having the "same" cached
content on all servers
- Global Cache with no dublicates of 70GB (5 servers times 14 GB io-
cache ram per server)
-> How exactly does the io-cache work? can i specify TTL for a file-
pattern, or specify which files should not be cached at all... or..
or? Cant find any specific info on this.
- I can put apache/lighttpd on all the servers which then have
direct access to the storage, no need for extra webservers for serving
static & cacheable content
- Remote access: i can mount the fs from another Location (another
DC), securely if i wish through some kind of VPN and use it there for
backup purposes
- Expandable: i can just put 2 new servers online, each with 2/4/6/8
drives
If you have read and understood ( :-) ) this, i would highly
appreciate if you could answer my questions and/or comment the input
you might have.
Thanks a lot,
Moritz
More information about the Gluster-users
mailing list