[Gluster-devel] Crawling and indexing hardware
Marcus Herou
marcus.herou at tailsweep.com
Wed May 7 18:06:40 UTC 2008
Hi.
I really fancy GlusterFS since it truly feels like a system which can scale
out nicely on the storage side which frankly most solutions don't. I've been
working with big HP SAN's and NetApp.
Both solutions was exported as NFS shares and both solutions had huge
problems at peak traffic time even though it was high-end stuff.
Anyway I'm not gonna walk down my memory lane of NFS Storage for two long
but want to ask some storage experts a question. I'm quite good at scaling
webapps but I'm not experienced enough to scale storage, that's why I'm
turning here.
I will use GlusterFS as a means for storing and accessing crawled and
indexed data of let's say 1 billion entries. The data will consist of two
types.
1. Big index files ~x Gig each
2. Many small files in a huge amount of directories.
I will go for many cheap SATA disks since I think SAS just won't do storage
wise.
Does anyone have experience of good price worthy storage enclosures in
cojunction with servers (Example Dell 1950 + MD1000 or HPDL160 + HP MSA60 )
or more complete solutions like Dell AFX150 or HP DL385?
Does anyone as well know if it is better to go for separating enclosures
from servers to get best-of-the-best of the two worlds ?
I really would like a solution where I just can buy another 2 enclosures and
4 GlusterFS servers whenever performance or storage constraints kicks in.
Scale out easily, cheaply, always increasing performance is my mantra :)
Finally what tools would suite to test zillions of small files ? Bonnie++ ?
Fewer big files ? Still Bonnie++ or perhaps IOZone ?
Kindly
/Marcus Herou
--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou at tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/
More information about the Gluster-devel
mailing list