[Gluster-users] Test results and Performance Tuning efforts ...

Mon Oct 12 21:54:55 UTC 2015

----- Original Message -----
> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> To: "gluster-users" <gluster-users at gluster.org>
> Sent: Thursday, October 8, 2015 8:10:09 PM
> Subject: [Gluster-users] Test results and Performance Tuning efforts ...
> 
> 
> 
> Morning, hope the folllowing ramble is ok, just examining the results of some
> extensive (and destructive  ) testing of gluster 3.6.4 on some disks I had
> spare. Cluster purpose is solely for hosting qemu vm’s via Proxmox 3.4
> 
> 
> 
> Setup: 3 Nodes, well spec’d
> 
> - 64 GB RAM
> 
> - VNB & VNG
> 
> * CPU : E5-2620
> 
> - VNA
> 
> * CPU’s : Dual E5-2660
> 
> - Already in use as a Proxmox and Ceph Cluster running 30 Windows VM’s
> 
> 
> 
> Gluster Bricks.
> 
> - All bricks on ZFS with 4 GB RAM ZIL, 1GB SSD SLOG and 10GB SSD Cache
> 
> - LZ4 Compression
> 
> - Sync disabled
> 
> 
> 
> Brick 1:
> 
> - 6 Velocitoraptors in a RAID10+ (3 Mirrors)
> 
> - High performance
> 
> - Already hosting 8 VM’s
> 
> 
> 
> Bricks 2 & 3:
> 
> - Spare external USB 1TB Toshiba Drive attached via USB3
> 
> - Crap performance  About 50/100 MB/s R/W
> 
> 
> 
> 
> 
> Overall impressions – pretty good. Installation is easy and now I’ve been
> pointed to up to date docs and got the hang of the commands, I’m happy with
> the administration – vastly simpler than Ceph. The ability to access the
> files on the native filesystem is good for peace of mind and enables some
> interesting benchmark comparisons. I simulated drive failure by killing all
> the gluster processes on a node and it seemed to cope ok.
> 
> 
> 
> I would like to see better status information such as “Heal % progress”,
> “Rebalance % progress”
> 
> 
> 
> NB: Pulling a USB external drive is a * bad * idea as it has no TLER support
> and this killed an entire node, had to hard reset it. In production I would
> use something like WD Red NAS drives.
> 
> 
> 
> 
> 
> Despite all the abuse I threw at it I had no problems with split brain etc
> and the integration with proxmox is excellent. When running write tests I
> was very pleased to see it max out my bonded 2x1GB connections, something
> ceph has never been able to do. I consistently got 110+ MB/s raw write
> results inside VM’s
> 
> 
> 
> Currently running 4 VM’s off the Gluster datastore with no issues.
> 
> 
> 
> Benchmark results – done using Crystal DiskMark inside a Windows 7 VM, with
> VIRTIO drivers and writeback enabled. I tested a Gluster replica 3 setup,
> replica 1 and direct off the disk (ZFS). Multpile tests were run to get a
> feel for average results.
> 
> 
> 
> Node VNB
> 
> - Replica 3
> 
> - Local Brick: External USB Toshiba Drive
> 
> - -----------------------------------------------------------------------
> 
> - CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
> 
> - Crystal Dew World : http://crystalmark.info/
> 
> - -----------------------------------------------------------------------
> 
> - * MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]
> 
> -
> 
> - Sequential Read : 738.642 MB/s
> 
> - Sequential Write : 114.461 MB/s
> 
> - Random Read 512KB : 720.623 MB/s
> 
> - Random Write 512KB : 115.084 MB/s
> 
> - Random Read 4KB (QD=1) : 9.684 MB/s [ 2364.3 IOPS]
> 
> - Random Write 4KB (QD=1) : 2.511 MB/s [ 613.0 IOPS]
> 
> - Random Read 4KB (QD=32) : 24.264 MB/s [ 5923.7 IOPS]
> 
> - Random Write 4KB (QD=32) : 5.685 MB/s [ 1387.8 IOPS]
> 
> -
> 
> - Test : 1000 MB [C: 70.1% (44.8/63.9 GB)] (x5)
> 
> - Date : 2015/10/09 9:30:37
> 
> - OS : Windows 7 Professional N SP1 [6.1 Build 7601] (x64)
> 
> 
> 
> 
> 
> Node VNA
> 
> - Replica 1 (So no writing over ethernet)
> 
> - Local Brick: High performance Velocipraptors in RAID10
> 
> - Sequential Read : 735.224 MB/s
> 
> - Sequential Write : 718.203 MB/s
> 
> - Random Read 512KB : 888.090 MB/s
> 
> - Random Write 512KB : 453.174 MB/s
> 
> - Random Read 4KB (QD=1) : 11.808 MB/s [ 2882.9 IOPS]
> 
> - Random Write 4KB (QD=1) : 4.249 MB/s [ 1037.4 IOPS]
> 
> - Random Read 4KB (QD=32) : 34.787 MB/s [ 8492.8 IOPS]
> 
> - Random Write 4KB (QD=32) : 5.487 MB/s [ 1339.5 IOPS]
> 
> 
> 
> 
> 
> Node VNA
> 
> - Direct on ZFS (No Gluster)
> 
> - Sequential Read : 2841.216 MB/s
> 
> - Sequential Write : 1568.681 MB/s
> 
> - Random Read 512KB : 1753.746 MB/s
> 
> - Random Write 512KB : 1219.437 MB/s
> 
> - Random Read 4KB (QD=1) : 26.852 MB/s [ 6555.6 IOPS]
> 
> - Random Write 4KB (QD=1) : 20.930 MB/s [ 5109.8 IOPS]
> 
> - Random Read 4KB (QD=32) : 58.515 MB/s [ 14286.0 IOPS]
> 
> - Random Write 4KB (QD=32) : 46.303 MB/s [ 11304.3 IOPS]
> 
> 
> 
> 
> 
> 
> 
> Performance:
> 
> Raw read performance is excellent, averaging 700Mb/s – I’d say the ZFS &
> Cluster caches are working well.
> 
> As mentioned raw write maxed out at 110 MB/s, near the max ethernet speed.
> 
> Random I/O is pretty average, it could be the Toshba drives bring things
> down, though even when I took them out of the equation it wasn’t much
> improved.
> 
> 
> 
> Direct off the disk was more than double the replica 1 brick in all areas,
> but I don’t find that surprising. I expected a fair amount of overhead with
> a cluster fs, and a 1-brick setup is not a real world usage. I was fairly
> impressed that adding two bricks to replica 3 made no real difference to the
> read results and the write results were obviously limuted by network speed.
> If only I could afford 10GB cards and a switch ...
> 
> 
> 
> I would like to improve the IOPS – these are the current tunables I have set
> – any suggestions for improvements would be much appreciated:

Random IO has vastly improved with MT epoll introduced in 3.7, try a test on 3.7 with server and client event threads set to 4.  If you want to confirm this before you upgrade run top -H during your testing and look for a hot thread(single thread / CPU pegged at 100%).  If you see this during your runs on 3.6 then the MT epoll implementation in 3.7 will def help you out.

-b

> 
> 
> 
> 
> 
> Volume Name: datastore1
> 
> Type: Replicate
> 
> Volume ID: 3bda2eee-54de-4540-a556-2f5d045c033a
> 
> Status: Started
> 
> Number of Bricks: 1 x 3 = 3
> 
> Transport-type: tcp
> 
> Bricks:
> 
> Brick1: vna.proxmox.softlog:/zfs_vm/datastore1
> 
> Brick2: vnb.proxmox.softlog:/glusterdata/datastore1
> 
> Brick3: vng.proxmox.softlog:/glusterdata/datastore1
> 
> Options Reconfigured:
> 
> performance.io-thread-count: 32
> 
> performance.write-behind-window-size: 32MB
> 
> performance.cache-size: 1GB
> 
> performance.cache-refresh-timeout: 4
> 
> nfs.disable: on
> 
> nfs.addr-namelookup: off
> 
> nfs.enable-ino32: on
> 
> diagnostics.client-log-level: WARNING
> 
> diagnostics.brick-log-level: WARNING
> 
> performance.write-behind: on
> 
> 
> 
> Thanks,
> 
> 
> 
> Lindsay
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Sent from Mail for Windows 10
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users