[Gluster-users] Inviting comments on my plans
Shawn Heisey
gluster at elyograg.org
Sat Nov 17 18:04:33 UTC 2012
I am planning the following new gluster 3.3.1 deployment, please let me
know whether I should rethink any of my plans. If you don't think what
I'm planning is a good idea, I will need concrete reasons.
Dell R720xd servers with two internal OS drives and 12 hot-swap external
3.5 inch bays. Fedora 18 alpha, to be upgraded to Fedora 18 when it is
released.
2TB simple LVM volumes for bricks.
A combination of 4TB disks (two bricks per drive) and 2TB disks.
Distributed-Replicated volume, replica 2.
The initial two servers will have 4TB disks. As we free up existing 2TB
SAN drives, additional servers will be added with those drives.
Brick filesystems will be mounted under /bricks on each server.
Subdirectories within those filesystems will be used as the actual brick
paths on create/add commands.
Now for the really controversial part of my plans: Left-hand brick
filesystems (listed first in each replica set) will be XFS, right-hand
bricks will be BTRFS. The idea here is that we will have one copy of
the volume on a fully battle-tested and reliable filesystem, and another
copy of the filesystem stored in a way that we can create periodic
snapshots for last-ditch "oops" recovery. Because of the distributed
nature of the filesystem, using those snapshots will not be
straightforward, but it will be POSSIBLE.
To answer another question that I'm sure will come up: Why no RAID?
There are a few reasons:
* We will not be able to fill all drive bays initially.
Although Dell lets you add disks and grow the RAID volume, and Linux
probably lets you grow the filesystem once that's done, it is a long
drawn out process, with horrible performance penalties while it is
happening. By putting bricks on the disks directly, we do not have to
deal with this.
* Performance.
RAID 5/6 comes with a severe penalty on performance during sustained
writes -- writing more data than will fit in your RAID controller's
cache memory. Also, if you have a failed disk, all performance is
greatly impacted during the entire rebuild process, which for a 4TB disk
is likely to take a few days.
* Disk space loss.
With RAID5 on 4TB disks, we would lose 4TB of disk space for each server
pair. With RAID6, that would be 8TB per server pair. For a fully
populated server, that means 40TB instead of 48TB. The bean counters
are technically clueless, but they understand those numbers.
Is this a reasonable plan? If not, why?
Thanks,
Shawn
More information about the Gluster-users
mailing list