[Gluster-users] Finding performance bottlenecks
Darrell Budic
budic at onholyground.com
Thu May 3 21:24:53 UTC 2018
Tony’s performance sounds significantly sub par from my experience. I did some testing with gluster 3.12 and Ovirt 3.9, on my running production cluster when I enabled the glfsapi, even my pre numbers are significantly better than what Tony is reporting:
———————————————————
Before using gfapi:
]# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s
# hdparm -tT /dev/vda
/dev/vda:
Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/sec
Timing buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec
#bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 196245 30 105331 15 962775 49 1638 34
Latency 1578ms 1383ms 201ms 301ms
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pre-glapi 8G 155937 27 102899 14 1030285 54 1763 45
Latency 694ms 1333ms 114ms 229ms
(note, sequential reads seem to have been influenced by caching somewhere…)
After switching to gfapi:
# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s
# hdparm -tT /dev/vda
/dev/vda:
Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/sec
Timing buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec
#bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u root
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 359100 59 185289 24 489575 31 2079 67
Latency 160ms 355ms 36041us 185ms
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
glapi 8G 341307 57 180546 24 472572 35 2655 61
Latency 153ms 394ms 101ms 116ms
So excellent improvement in write throughput, but the significant improvement in latency is what was most noticed by users. Anecdotal reports of 2x+ performance improvements, with one remarking that it’s like having dedicated disks :)
This system is on my production cluster, so it’s not getting exclusive disk access, but this VM is not doing anything else itself. The cluster is 3 xeon E5-2609 v3 @ 1.90GHz servers w/ 64G ram, SATA2 disks; 2 with 9x spindles each, 1 with 8x slightly faster disks (all spinners). Using ZFS stripes with lz4 compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at the moment. The cluster itself has about 70 running VMs in varying states of switching to gfapi use, but my main sql servers are using their own volumes and not competing for this one. These have not yet had the spectre/meltdown patches applied.
This will be skewed because I forced it to not steal all the ram on the server (reads will certainly be cached), but an idea of what it can do disk wise, on the volume used above:
# bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
zfs-server 8G 604940 79 510410 87 1393862 99 3164 91
Latency 99545us 100ms 247us 152ms
Just for fun from one of the servers showing base load and this testing:
——————————————————————————
> From: Vincent Royer <vincent at epicenergy.ca>
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> Date: May 3, 2018 at 1:58:03 PM CDT
> To: tony at hoyle.me.uk
> Cc: gluster-users at gluster.org
>
> It worries me how many threads talk about low performance. I'm about to build out a replica 3 setup and run Ovirt with a bunch of Windows VMs.
>
> Are the issues Tony is experiencing "normal" for Gluster? Does anyone here have a system with windows VMs and have good performance?
>
> Vincent Royer
> 778-825-1057
>
>
> <http://www.epicenergy.ca/>
> SUSTAINABLE MOBILE ENERGY SOLUTIONS
>
>
>
>
>
> On Wed, May 2, 2018 at 7:52 AM Tony Hoyle <tony at hoyle.me.uk <mailto:tony at hoyle.me.uk>> wrote:
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> >
> > So is the KVM or Vmware as the host(s)? I basically have the same setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating. Hence I am moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
>
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower. You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
>
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
>
> > Which operating system is gluster on?
>
> Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
>
> > Did you do iperf between all nodes?
>
> Yes, around 9.7Gb/s
>
> It doesn't appear to be raw read speed but iowait. Under nfs load with
> multiple VMs I get an iowait of around 0.3%. Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage. This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!
>
> Tony
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>_______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180503/43e47f67/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 95175 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180503/43e47f67/attachment.png>
More information about the Gluster-users
mailing list