[Gluster-users] Finding performance bottlenecks

Mon May 7 14:55:38 UTC 2018

----- Original Message -----
> From: "Darrell Budic" <budic at onholyground.com>
> To: "Vincent Royer" <vincent at epicenergy.ca>, tony at hoyle.me.uk
> Cc: gluster-users at gluster.org
> Sent: Thursday, May 3, 2018 5:24:53 PM
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> 
> Tony’s performance sounds significantly sub par from my experience. I did
> some testing with gluster 3.12 and Ovirt 3.9, on my running production
> cluster when I enabled the glfsapi, even my pre numbers are significantly
> better than what Tony is reporting:
> 
> ———————————————————
> Before using gfapi:
> 
> ]# dd if=/dev/urandom of=test.file bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test.file of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s

This is no where near what I would expect.  With VMs I am able to saturate a 10G interface if I run enough IOs from enough VMs and use LVM striping(8 files / PVs) inside the VMs.  So thats 1200 MB of aggregate throughput and each VM will do 200-300+ MB / sec writes, 300-400+ reads.

I have seen this issue before though, once it was resolved by an upgrade of oVIRT another time I fixed the alignment of the RAID / LVM / XFS stack.  There is one instance I haven't yet figured out yet :/  I want to build on a fresh HW stack.  Make sure you have everything aligned in the storage stack, writeback cache on the RAID controller, jumbo frames, the gluster VM group set, and a random IO tuned profile.  If you want to tinker with LVM striping inside the VM I have had success with that as well.

Also note:

Using urandom will significantly lower perf, it is dependent on how fast your CPU can create random data.  Try /dev/zero or FIO / IOzone / smallfile - https://github.com/bengland2/smallfile, that will eliminate CPU as a bottleneck.

Also remember VMs are a heavy random IO workload, you need IOPs on your disks to see good perf.  Also, since gluster doesn't have a MD server those IOs are moved to xattrs on teh files themselves.  This is a bit of a double edged sword as these take IOPs as well and if the backend is not properly aligned this can double or triple the IOPs overhead on these small reads and writes that gluster uses to in place of a MD server.

HTH

-b

> 
> # hdparm -tT /dev/vda
> 
> /dev/vda:
> Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/sec
> Timing buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec
> 
> # bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root
> 
> Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> pre-glapi 8G 196245 30 105331 15 962775 49 1638 34
> Latency 1578ms 1383ms 201ms 301ms
> 
> Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> pre-glapi 8G 155937 27 102899 14 1030285 54 1763 45
> Latency 694ms 1333ms 114ms 229ms
> 
> (note, sequential reads seem to have been influenced by caching somewhere…)
> 
> After switching to gfapi:
> 
> # dd if=/dev/urandom of=test.file bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=test.file of=/dev/null
> 2097152+0 records in
> 2097152+0 records out
> 1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s
> 
> # hdparm -tT /dev/vda
> 
> /dev/vda:
> Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/sec
> Timing buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec
> 
> #bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u root
> 
> Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> glapi 8G 359100 59 185289 24 489575 31 2079 67
> Latency 160ms 355ms 36041us 185ms
> 
> Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> glapi 8G 341307 57 180546 24 472572 35 2655 61
> Latency 153ms 394ms 101ms 116ms
> 
> So excellent improvement in write throughput, but the significant improvement
> in latency is what was most noticed by users. Anecdotal reports of 2x+
> performance improvements, with one remarking that it’s like having dedicated
> disks :)
> 
> This system is on my production cluster, so it’s not getting exclusive disk
> access, but this VM is not doing anything else itself. The cluster is 3 xeon
> E5-2609 v3 @ 1.90GHz servers w/ 64G ram, SATA2 disks; 2 with 9x spindles
> each, 1 with 8x slightly faster disks (all spinners). Using ZFS stripes with
> lz4 compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at
> the moment. The cluster itself has about 70 running VMs in varying states of
> switching to gfapi use, but my main sql servers are using their own volumes
> and not competing for this one. These have not yet had the spectre/meltdown
> patches applied.
> This will be skewed because I forced it to not steal all the ram on the
> server (reads will certainly be cached), but an idea of what it can do disk
> wise, on the volume used above:
> # bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096
> Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> zfs-server 8G 604940 79 510410 87 1393862 99 3164 91
> Latency 99545us 100ms 247us 152ms
> 
> Just for fun from one of the servers showing base load and this testing:
> 
> ——————————————————————————
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Vincent Royer < vincent at epicenergy.ca >
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> Date: May 3, 2018 at 1:58:03 PM CDT
> To: tony at hoyle.me.uk
> Cc: gluster-users at gluster.org
> 
> It worries me how many threads talk about low performance. I'm about to build
> out a replica 3 setup and run Ovirt with a bunch of Windows VMs.
> 
> Are the issues Tony is experiencing "normal" for Gluster? Does anyone here
> have a system with windows VMs and have good performance?
> 
> Vincent Royer
> 778-825-1057
> 
> 
> SUSTAINABLE MOBILE ENERGY SOLUTIONS
> 
> 
> 
> 
> 
> On Wed, May 2, 2018 at 7:52 AM Tony Hoyle < tony at hoyle.me.uk > wrote:
> 
> 
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> > 
> > So is the KVM or Vmware as the host(s)? I basically have the same setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking. I do notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating. Hence I am moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
> 
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower. You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
> 
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
> 
> > Which operating system is gluster on?
> 
> Debian Linux. Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
> 
> > Did you do iperf between all nodes?
> 
> Yes, around 9.7Gb/s
> 
> It doesn't appear to be raw read speed but iowait. Under nfs load with
> multiple VMs I get an iowait of around 0.3%. Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage. This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!
> 
> Tony
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users