[Gluster-users] Design/HW for cost-efficient NL archive >= 0.5PB?

Wed Dec 25 19:47:51 UTC 2013

I am new to Gluster, but so far it seems very attractive for my needs. I am
trying to assess its suitability for a cost-efficient storage problem I am
tackling. Hopefully someone can help me find how to best solve my problem.

Capacity:
Start with around 0.5PB usable

Redundancy:
2 replicas with non-RAID is not sufficient. Either 3 replicas with non-raid
or some combination of 2 replicas and RAID?

File types:
Large files, around 400-1500MB each.

Usage pattern:
Archive (not sure if this matches nearline or not..) with files being added
at around 200-300GB/day (3-400 files/day). Very few reads, order of 10 file
accesses per day. Concurrent reads highly unlikely.

The main two factors for me are cost and redundancy. Losing data is not an
option, being an archive solution. Cost/usable TB is the other key factor,
as we see growth estimates of 100-500TB/year.

Looking just at $/TB, a RAID-based approach to me sounds more efficient.
But RAID rebuild times with large arrays of large capacity drives sound
really scary. Not sure if something smart can be done since we will still
have a replica left during the rebuild?

So, any suggestions on what would be possible and cost-efficient solutions?

- Any experience on dense servers, what is advisable? 24/36/50/60 slots?
- SAS expanders/storage pods?
- RAID vs non-RAID?
- Number of replicas etc?

Best,

Fredrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131225/f091ab33/attachment.html>