[Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud
Joe Landman
landman at scalableinformatics.com
Thu Sep 29 17:58:15 UTC 2011
On 09/29/2011 01:44 PM, David Miller wrote:
> On Thu, Sep 29, 2011 at 1:32 PM, David Miller <david3d at gmail.com
> <mailto:david3d at gmail.com>> wrote:
>
> Couldn't you accomplish the same thing with flashcache?
> https://github.com/facebook/flashcache/
>
>
> I should expand on that a little bit. Flashcache is a kernel module
> created by Facebook that uses the device mapper interface in Linux to
> provide a ssd cache layer to any block device.
>
> What I think would be interesting is using flashcache with a pcie ssd as
> the caching device. That would add about $500-$600 to the cost of each
> brick node but should be able to buffer the active IO from the spinning
> media pretty well.
Erp ... low end PCIe flash with decent performance start much higher
than 500-600 $ USD.
> Somthing like this.
> http://www.amazon.com/OCZ-Technology-Drive-240GB-Express/dp/B0058RECUE
> or something from FusionIO if you want something that's aimed more at
> the enterprise.
Flashcache is reasonably good, but there are many variables in using it,
and its designed for a different use case. For most people the
writeback may be reasonable, but other use cases would require different
configs.
This said, please understand that it (and L2ARC, and other similar
things) are *not* silver bullets (e.g. not magical things that will
instantly make something far better, at no cost/effort). They do
introduce additional complexity, and additional tuning points.
The thing you cannot get rid of, the network traversal, is implicated in
much of the performance degradation for small files. Putting the file
system on a RAM disk (if possible, tmpfs doesn't support xattrs),
wouldn't make the system much faster for small files. Eliminating the
network traversal and doing local distributed caching of metadata on the
client side ... could ... but this would be a huge new complication, and
I'd argue that it probably isn't worth it.
For the short duration, small file performance is going to be bad. You
might be able to play some games to make this performance better (L2ARC
etc. could help in some aspects, but they won't be universally much better).
What matters most is very good design on the storage backend (we are
biased due to what it is we sell/support), very good networking, and
very good gluster implementation/tuning. Its real easy to hit very slow
performance by missing critical elements. We field many inquiries which
usually start out with "we built our own and the performance isn't that
good." You won't get good performance on the cluster file system if the
underlying file system and storage design isn't going to give it to you
in the first place.
This said, please understand that there is a (significant) performance
cost to all those nice features in ZFS. And there is a reason why its
not generally considered a high performance file system. So if you
start building with it, you shouldn't necessarily think that the whole
is going to be faster than the sum of the parts. Might be worse.
This is a caution from someone who has tested/shipped many different
file systems in the past. ZFS included, on Solaris and other machines.
There is a very significant performance penalty one pays for using
some of these features. You have to decide if this penalty is worth it.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Gluster-users
mailing list