[Gluster-devel] Harddisk economy alternatives

Magnus Näslund magnus at arkivdigital.se
Wed Nov 9 16:50:00 UTC 2011

We're a digital archive that stores digital images of old records and 
books. We're about to evaluate glusterfs as a solution to our main 
storage needs. I'm soliciting advice from both glusterfs crew but also 
other users with similar needs.

Today we've got about 30 million original images, there is the high 
quality originals and batch processed highly compressed copy that's used 
by our customers.

So this gives 30 million large files (3-12MB) plus 30 million converted 
copies that lands in about 500KB per image.

The use-cases are a bit different: the big images will written once and 
batched read-only once or twice a year.
The small images will be written once or twice a year, but read-accessed 
24/7, and is more latency sensitive.

We want the data replicated at least 3 times physically (box-wise), so 
we've ordered 3 test servers with 24x3TB "enterprise" SATA disks each 
with an areca card + bbu. We'll probably be running the tests feeding 
raid volumes to glusterfs, and from what I've seen this seems to be a 

Possible future:

Since our storage system will be in it for a really long term, we're 
looking at the total economics of the solution vs. the data safety concerns.

We've seen suggestions on letting glusterfs manage the disk directly.
The way I see it, this would give a win in that
	1) We would be using all disks, no RAID/spare storage overhead
	2) No RAID-rebuilds
	3) ...
	4) Profit

Also, we know that any long time system we build should be planned with 
replacing disks continuously.

So in my mind we could buy quality boxes with 24-36 disks run by 3-4 
SATA controller cards (Marvell?), using cheap and large desktop disks 
(maybe not the "green" variety). We could have a reporting system on top 
of glusterfs that reports defective disks that would be replaced as part 
of our on-duty maintenance. Since the storage is replicated over 3+ 
boxes, the breakage of a single disk would not compromise the data 
safety as long as the disks are replaced in timely manner.

I would be very interested to hear other peoples experience or ideas 
about storing this kind of data, and particular on the pros/cons on the 
pass-thru/direct disk model.

Any constructive input is welcome!

Magnus Näslund

More information about the Gluster-devel mailing list