[Gluster-users] New GlusterFS Config with 6 x Dell R720xd's and 12x3TB storage

Mon Dec 3 23:29:51 UTC 2012

Each of the 6 servers now have 10 3TB LUNs that physically exist on a RAID 6.

I've created a single large distributed volume as a first test. Is this a typical configuration (break large storage on a single server into smaller bricks), or  is it more common to take the smaller LUNs and use LVM to create a single large logical volume that becomes the Brick?

I'm thinking using the smaller LUNs individually would be better from an fsck stand point, but more cumbersome when creating GlusterFS volumes.

That said, another thing we are looking at doing is offering both distributed and distributed replica storage, depending on the users requirements. Best I can tell, in order to do this in GlusterFS, I need two volumes, each with its own bricks? If that's the case, then we'd need to stick with multiple bricks per server.

Mike

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Mike Hanby
Sent: Monday, December 03, 2012 9:40 AM
To: Brian Candler
Cc: Gluster-users at gluster.org
Subject: Re: [Gluster-users] New GlusterFS Config with 6 x Dell R720xd's and 12x3TB storage

Howdy Brian, thanks for the feedback.

I knew I forgot something. All of these servers are connected via a single 10Gb Ethernet link. The clients are mostly going to attach via 1Gb.

Thanks for the tip on the RAM, I'll put that in our configuration. The systems were provisioned to serve as both storage nodes and condor compute nodes. Testing will tell us whether or not the storage servers can also perform compute processes without affecting the performance of the filesystem.

I'll share my configuration notes here in case anyone in the future stumbles on this and is interested in provisioning the storage via the command line and Dell's omconfig tool.

The R720xd's came preconfigured with a single large RAID6 virtual disk for the data drives. The following will remove that configuration.
1. First identify the virtual disk id
# omreport storage vdisk controller=0

ID                        : 1
Status                    : Ok
Name                      : Virtual Disk 1
State                     : Ready
Hot Spare Policy violated : Not Assigned
Encrypted                 : No
Layout                    : RAID-6
Size                      : 27,940.00 GB (30000346562560 bytes)
...

2. Delete the vdisk
# omconfig storage vdisk action=deletevdisk controller=0 vdisk=1

3. List the physical disks on controller 0
# omreport storage pdisk controller=0 |grep ^ID
ID                        : 0:1:0
ID                        : 0:1:1
ID                        : 0:1:2
...
ID                        : 0:1:11
ID                        : 0:1:12
ID                        : 0:1:13

4. Now list the physical disks that are assigned to vdisk 0 (the operating system mirror)
# omreport storage pdisk controller=0 vdisk=0|grep ^ID
ID                        : 0:1:12
ID                        : 0:1:13

5. Create 10 virtual disks using pdisks 0:1:0 thru 0:1:11 
# for n in {1..10}; do
   omconfig storage controller action=createvdisk controller=0 \
   raid=r6 \
   size=2794g \
   pdisk=0:1:0,0:1:1,0:1:2,0:1:3,0:1:4,0:1:5,0:1:6,0:1:7,0:1:8,0:1:9,0:1:10,0:1:11 \
   stripesize=128kb \
   diskcachepolicy=disabled \
   readpolicy=ara \
   writepolicy=wb \
   name=rcs_data${n};
 done

6. virtual disks can be listed using:
# omreport storage vdisk controller=0

7. The OS lists the disk is vdisk=0, the new vdisks are 1 thru 10. Format the new disks using XFS
# unset devs && for n in {1..10}; do
   devs="$devs $(omreport storage vdisk controller=0 vdisk=$n | grep ^Device | awk '{print $4}')";
  done

# m=1 && for dev in $devs; do
  mkfs.xfs -i size=512 -L brick${m} ${dev};
  let m=$m+1;
 done

-----Original Message-----
From: Brian Candler [mailto:B.Candler at pobox.com] 
Sent: Sunday, December 02, 2012 1:03 PM
To: Mike Hanby
Cc: Gluster-users at gluster.org
Subject: Re: [Gluster-users] New GlusterFS Config with 6 x Dell R720xd's and 12x3TB storage

On Fri, Nov 30, 2012 at 07:21:54PM +0000, Mike Hanby wrote:
>    We have the following hardware that we are going to use for a GlusterFS
>    cluster.
> 
>    6 x Dell R720xd's (16 cores, 96G)

Heavily over-specified, especially the RAM. Having such large amounts of RAM
can even cause problems if you're not careful.  You probably want to use
sysctl and /etc/sysctl.conf to set

    vm.dirty_background_ratio=1
    vm.dirty_ratio=5   (or less)

so that dirty disk blocks are written to disk sooner, otherwise you may find
the system locks up for several minutes at a time as it flushes the enormous
disk cache.

I use 4 cores + 8GB for bricks with 24 disks (and they are never CPU-bound)

>    I now need to decide how to configure the 12 x 3TB disks in each
>    server, followed by partitioning / formatting them in the OS.
> 
>    The PERC H710 supports RAID 0,1,5,6,10,50,60. Ideally we'd like to get
>    good performance, maximize storage capacity and still have parity :-)

For performance: RAID10
For maximum storage capacity: RAID5 or RAID6

>    * Stripe Element Size: 64, 128, 256, 512KB, 1MB

Depends on workload. With RAID10 and lots of concurrent clients, I'd tend to
use a 1MB stripe size. Then R/W by one client is likely to be on a different
disk to R/W by another client, and although throughput to individual clients
will be similar to a single disk, the total throughput is maximised.

If your accesses are mostly by a single client, then you may not get enough
readahead to saturate the disks with such a large stripe size; with RAID5/6
your writes may be slow if you can't write a stripe at a time (which may be
possible if you have a battery-backed card).  So for these scenarios
something like 256K may work better.  But you do need to test it.

Finally: you don't mention your network setup.

With 12 SATA disks, you can expect to get 25-100MB/sec *per disk* depending
on how sequential and how large the transfers are.  So your total disk
throughput is potentially 12 times that, i.e.  300-1200MB/sec.  The bottom
end of this range is easily achievable, and is already 2.5 times a 1G link.
At the top end you could saturate a 10G link.

So if you have only 1G networking it's very likely going to be the
bottleneck.

Regards,

Brian.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users