[Gluster-users] ZFS + Linux + Glusterfs for a production ready 100+ TB NAS on cloud

Sun Sep 25 07:56:41 UTC 2011

Hi,

I think you are up to an interesting project. May be you could share a
few more details. (1) What cloud are you planning to use, EC2 with EBS
volumes or some hosted stuff like Rackspace? (2) What are your
motivations for using RAID10 (for example on Amazon that would
increase your monthly price from $10k to $20k just for storage not
counting io operations --- I am not suggesting you use raid0, btw) (3)
is this for something like a web farm with one unix user accessing the
web farm or is it a multi-user HPC like environment for which you need
a Posix file system?

So far the discussion has been focusing on XFS vs ZFS. I admit that I
am a fan of ZFS and I have only used XFS for performance reasons on
mysql servers where it did well. When I read something like this
http://oss.sgi.com/archives/xfs/2011-08/msg00320.html that makes me
not want to use XFS for big data. You can assume that this is a real
recent bug because Joe is a smart guy who knows exactly what he is
doing. Joe and the Gluster guys are vendors who can work around these
issues and provide support. If XFS is the choice, may be you should
hire them for this gig.

ZFS typically does not have these FS repair issues in the first place.
The motivation of Lawrence Livermore for porting ZFS to Linux was
quite clear:

http://zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf

OK, they have 50PB and we are talking about much smaller deployments.
However some of the limitations they report I can confirm. Also,
recovering from a drive failure with this whole LVM/Linux Raid stuff
is unpredictable. Hot swapping does not always work and if you
prioritize the re-sync of data to the new drive you can strangle the
entire box (by default the priority of the re-sync process is low on
linux). If you are a Linux expert you can handle this kind of stuff
(or hire someone) but if you ever want to give this setup to a Storage
Administrator you better give them something that they can use with
confidence (may be less of an issue in the cloud).
Compare to this to ZFS: re-silvering works with a very predictable
result and timing. There is a ton of info out there on this topic.  I
think that gluster users may be getting around many of the linux raid
issues by simply taking the entire node down (which is ok in mirrored
node settings) or by using hardware raid controllers. (which are often
not available in the cloud )
Some in the Linux community seem to be slightly opposed to ZFS (I
assume because of the licensing issue) and make sometimes odd
suggestions ("You should use BTRFS").

As someone who is involved with  managing hundreds of terabytes of
storage I can say that if something goes wrong with a big hunk of your
storage it quite a pain to get it back. I would only feel comfortable
to use a combination of gluster with anything as my primary storage if
I had it mirrored to another datacenter not using gluster technology
for the mirroring (hence my raid10 question, or may be that is what
you are planning). Primary storage of that size without mirroring I
would put on a commercial thing like isilon, IBM, Bluearc where I get
24*7 support, etc. We are currently happy users of glusterfs and we
are using it as a caching tier for hpc (our users have managed to
bring it down) and for backup and I love it for that. We are currently
testing ZFSOnLinux with gluster 3.2.3 on good hardware (8 core, 64 GB,
SSD for caching) with ultra cheap drives (WD green caviar) and the
performance results are very impressive. I am currently not too
concerned with stability. Should the kernel crash (which has not
happened yet) the data will be unaffected because no linux code (the
Solaris porting layer) is actually touching any of the hard drives.

dipe

On Sat, Sep 24, 2011 at 5:10 AM, RDP <rdp.com at gmail.com> wrote:
> Hello,
>   May be this question would have been addressed elsewhere but I did like
> the opinion and experience of other users.
> There could be some misconceptions that I might be carrying, so please be
> kind to point them out. Any help, advice and suggestions will be very highly
> appreciated.
> My goal is to get a greater than 100 TB gluster NAS up on the cloud. Each
> server will hold around 2x8TB disks. The export volume size (client disk
> mount size) would be greater than 20 TB.
> This is how I am planning to set it up all.. 16 servers each with 2x8=16 TB
> of space. The glusterfs will be replicate and distributed (raid-10). I did
> like to go with ZFS on linux for the disks.
> The client machines will use the glusterfs client for mounting the volumes.
> ext4 is limited to 16 TB due to userspace tool (e2fsprogs).
> Would this be considered as a production ready setup? The data housed on
> this cluster will is critical and hence I need to very sure before I go
> ahead with this kind of a setup.
> Or would using ZFS with Gluster makes more sense on FreeBSD or illuminos
> (ZFS is native there).
> Thanks a lot
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>