[Gluster-users] Inviting comments on my plans

Fernando Frediani (Qube) fernando.frediani at qubenet.net
Mon Nov 19 10:18:53 UTC 2012


I agree with the comment about Fedora and wouldn't choose it a distribution, but if you are comfortable with it go ahead as I don't think this will be the major pain.

RAID: I see where you are coming from to choose not have any RAID and I have thought myself before to do the same, mainly for performance reasons, but as mentioned how are you going to handle the drive swap ? If you think you can somehow automate it please share with us as I believe it is a major performance gain running the disks independently .

What you are willing to do with XFS+BRTFS I am not quiet sure it will work as you expect. Ideally you need to use snapshots from the Distributed Filesystem otherwise you might think you are getting a consistent copy of the data and you might not as you are not supposed to be reading/writing other than on the Gluster mount.

Performance: Simple and short - If you can compromise one disk per host AND choose to not go with independent disks(no RAID) go with RAID 5.
As your system grows the reads and write should (in theory) be distributed across all bricks. If you have a disk failed you can easily replace it and even in a unlikely event that you lose two disks in a server and loose its data entirely you still have a copy of it in another place and can rebuild it with a bit of patience , so no data loss.
Also we have had more than enough reports of bad performance in Gluster for all kinds of configurations (including RAID 10) so I don't think anyone should expect Gluster to perform that well, so using RAID 5, 6 or 10 underneath shouldn't make much difference and RAID 10 only would waste space. If you are storing bulk data (multimedia, images, big files) great, it will be streamed and sequential data and it should be ok and acceptable, but if you are storing things that do a lot of small IO or Virtual machines I'm not sure if Gluster is the best choice for you and you should think carefully about it.


-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Brian Candler
Sent: 18 November 2012 12:19
To: Shawn Heisey
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Inviting comments on my plans

On Sat, Nov 17, 2012 at 11:04:33AM -0700, Shawn Heisey wrote:
> Dell R720xd servers with two internal OS drives and 12 hot-swap 
> external 3.5 inch bays.  Fedora 18 alpha, to be upgraded to Fedora
> 18 when it is released.

I would strongly recommend *against* Fedora in any production environment, simply because there are new releases every 6 months, and releases are only supported for 18 months from release.  You are therefore locked into a complete OS reinstall every 6 months (or at best, three upgrades every 18 months).

If you want something that's free and RPM-based for production, I suggest you use CentOS or Scientific Linux.

> 2TB simple LVM volumes for bricks.
> A combination of 4TB disks (two bricks per drive) and 2TB disks.

With no RAID, 100% reliant on gluster replication? You discussed this later but I would still advise against this.  If you go this route, you will need to be very sure about your procedures for (a) detecting failed drives, and
(b) replacing failed drives.  It's certainly not a simple pull-out/push-in (or rebuild-on-hot-spare) as it would be with RAID.  You'll have to introduce a new drive, create the filesystem (or two filesystems on a 4TB drive), and reintroduce those filesystems as bricks into gluster: but not using replace-brick because the failed brick will have gone.  So you need to be confident in the abilities of your operational staff to do this.

If you do it this way, please test and document it for the rest of us.

> Now for the really controversial part of my plans: Left-hand brick 
> filesystems (listed first in each replica set) will be XFS, right-hand 
> bricks will be BTRFS.  The idea here is that we will have one copy of 
> the volume on a fully battle-tested and reliable filesystem, and 
> another copy of the filesystem stored in a way that we can create 
> periodic snapshots for last-ditch "oops" recovery.
> Because of the distributed nature of the filesystem, using those 
> snapshots will not be straightforward, but it will be POSSIBLE.

Of course it depends on your HA requirements, but another approach would be to have non-replicated volume (XFS) and then geo-replicate to another server with BTRFS, and do your snapshotting there. Then your "live" data is not dependent on BTRFS issues.

This also has the bonus that your BTRFS server could be network-remote.

> * Performance.
> RAID 5/6 comes with a severe penalty on performance during sustained 
> writes -- writing more data than will fit in your RAID controller's 
> cache memory.  Also, if you have a failed disk, all performance is 
> greatly impacted during the entire rebuild process, which for a 4TB 
> disk is likely to take a few days.

Actually, sustained sequential writes are the best case for RAID5/6. It's random writes which will kill you.

If random write performance is important I'd use RAID10 - which means for a fully populated server you'll get 24TB instead of 48TB.  Linux mdraid "far 2" layout will give you the same read performance as RAID0, indeed somewhat faster because all the seeks are within the first half of the drive, but with data replication.

With georeplication, your BTRFS backup server could be RAID5 or RAID6 though.

So it's down to the relative importance of various things:
- sufficient capacity
- sufficient performance
- acceptable cost
- ease of management (when a drive fails)
- data availability (if an entire server fails)

For me, "ease of management (when a drive fails)" comes very high on the list, because drive failures *will* happen, and you need to deal with them as a matter-of-course. You might not feel the same way.

I wrote "sufficient capacity/performance" rather than "maximum capacity/performance" because it depends what your business requirements are.  I mean, having no RAID might give you maximum performance on those 4TB drives, but is even that good enough for your needs?  If not, you might want to revisit and go with SSDs.  On the other hand, RAID6 might not be the
*best* write performance, but it might actually be good enough depending on what you're doing.


Gluster-users mailing list
Gluster-users at gluster.org

More information about the Gluster-users mailing list