<p dir="ltr"><br>
On Jan 5, 2020 03:05, Michael Richardson <hello@mikerichardson.com.au> wrote:<br>
><br>
> Hi all!<br>
><br>
> I'm experimenting with GFS for the first time have built a simple three-node cluster using AWS 'i3en' type instances. These instances provide raw nvme devices that are incredibly fast. <br>
><br>
> What I'm finding in these tests is that gluster is offering only a fraction of the raw nvme performance in a 3 replica set (ie, 3 nodes with 1 brick each). I'm wondering if there is anything I can do to squeeze more performance out. <br>
><br>
> For testing, I'm running fio using a 16GB test file with a 75/25 read/write split. Basically I'm trying to replicate a MySQL database which is what I'd ideally like to host here (which I realise is probably not practical). <br>
><br>
> My fio test command is: <br>
> $ fio --name=fio-test2 --filename=fio-test \<br>
> --randrepeat=1 \<br>
> --ioengine=libaio \<br>
> --direct=1 \<br>
> --runtime=300 \<br>
> --bs=16k \<br>
> --iodepth=64 \<br>
> --size=16G \<br>
> --readwrite=randrw \<br>
> --rwmixread=75 \<br>
> --group_reporting \<br>
> --numjobs=4<br>
><br>
> When I test this command directly on the nvme disk, I get: <br>
><br>
> READ: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=47.0GiB (51.5GB), run=156806-156806msec<br>
><br>
> WRITE: bw=105MiB/s (110MB/s), 105MiB/s-105MiB/s (110MB/s-110MB/s), io=16.0GiB (17.2GB), run=156806-156806msec<br>
><br>
> When I install the disk into a gluster 3-replica volume, I get:<br>
><br>
> READ: bw=86.3MiB/s (90.5MB/s), 86.3MiB/s-86.3MiB/s (90.5MB/s-90.5MB/s), io=25.3GiB (27.2GB), run=300002-300002msec<br>
><br>
> WRITE: bw=28.9MiB/s (30.3MB/s), 28.9MiB/s-28.9MiB/s (30.3MB/s-30.3MB/s), io=8676MiB (9098MB), run=300002-300002msec<br>
><br>
> If I do the same but with only 2 replicas, I get the same performance results. I also get the same rough values when doing 'read', 'randread', 'write', and 'randwrite' tests. <br>
><br>
> I'm testing directly on one of the storage nodes, so there's no variables line client/server network performance in the mix. <br>
><br>
> I ran the same test with EBS volumes and I saw similar performance drops when offering up the volume using gluster. A "Provisioned IOPS" EBS volume that could offer 10,000 IOPS directly, was getting only about 3500 IOPS when running as part of a gluster volume. <br>
><br>
> We're using TLS on the management and volume connections, but I'm not seeing any CPU or memory constraint when using these volumes, so I don't believe that is the bottleneck. Similarly, when I try with SSL turned off, I see no change in performance. <br>
><br>
> Does anyone have any suggestions on things I might try to increase performance when using these very fast disks as part of a gluster volume, or is this is to be expected when factoring in all the extra work that gluster needs to do when replicating data around volumes? <br>
1. Gluster & OS version ?<br>
2. Check I/O scheduler of the NVMes -> should be none/noop<br>
3. gluster volume set <i>volname</i> group db-workload<br>
Last login: Sun Jan 5 11:03:54 2020 from <a href="http://192.168.1.11">192.168.1.11</a><br>
[root@ovirt1 ~]# cat /var/lib/glusterd/groups/db-workload<br>
performance.open-behind=on<br>
performance.write-behind=off<br>
performance.stat-prefetch=off<br>
performance.quick-read=off<br>
performance.strict-o-direct=on<br>
performance.read-ahead=off<br>
<a href="http://performance.io">performance.io</a>-cache=off<br>
performance.readdir-ahead=off<br>
performance.client-io-threads=on<br>
server.event-threads=4<br>
client.event-threads=4<br>
performance.read-after-open=yes</p>
<p dir="ltr">4. Afterwards you can test different value for server/client event-threads (based on CPU cores).</p>
<p dir="ltr">> Thanks very much for your time!! I'll put the two full fio outputs below if anyone wants more details.<br>
><br>
> Mike<br>
><br>
><br>
> - First full fio test, nvme device without gluster<br>
><br>
> fio-test: (groupid=0, jobs=4): err= 0: pid=5636: Sat Jan 4 23:09:18 2020<br>
><br>
> read: IOPS=20.0k, BW=313MiB/s (328MB/s)(47.0GiB/156806msec)<br>
><br>
> slat (usec): min=3, max=6476, avg=88.44, stdev=326.96<br>
><br>
> clat (usec): min=218, max=89292, avg=11141.58, stdev=1871.14<br>
><br>
> lat (usec): min=226, max=89311, avg=11230.16, stdev=1883.88<br>
><br>
> clat percentiles (usec):<br>
><br>
> | 1.00th=[ 3654], 5.00th=[ 8455], 10.00th=[ 9372], 20.00th=[10159],<br>
><br>
> | 30.00th=[10552], 40.00th=[10814], 50.00th=[11076], 60.00th=[11338],<br>
><br>
> | 70.00th=[11731], 80.00th=[12256], 90.00th=[13042], 95.00th=[13960],<br>
><br>
> | 99.00th=[15795], 99.50th=[16581], 99.90th=[19268], 99.95th=[23200],<br>
><br>
> | 99.99th=[36439]<br>
><br>
> bw ( KiB/s): min=75904, max=257120, per=25.00%, avg=80178.59, stdev=9421.58, samples=1252<br>
><br>
> iops : min= 4744, max=16070, avg=5011.15, stdev=588.85, samples=1252<br>
><br>
> write: IOPS=6702, BW=105MiB/s (110MB/s)(16.0GiB/156806msec); 0 zone resets<br>
><br>
> slat (usec): min=4, max=5587, avg=88.52, stdev=325.86<br>
><br>
> clat (usec): min=54, max=29847, avg=4491.18, stdev=1481.06<br>
><br>
> lat (usec): min=63, max=29859, avg=4579.83, stdev=1508.50<br>
><br>
> clat percentiles (usec):<br>
><br>
> | 1.00th=[ 947], 5.00th=[ 1975], 10.00th=[ 2737], 20.00th=[ 3458],<br>
><br>
> | 30.00th=[ 3916], 40.00th=[ 4178], 50.00th=[ 4424], 60.00th=[ 4686],<br>
><br>
> | 70.00th=[ 5014], 80.00th=[ 5473], 90.00th=[ 6259], 95.00th=[ 6980],<br>
><br>
> | 99.00th=[ 8717], 99.50th=[ 9503], 99.90th=[10945], 99.95th=[11600],<br>
><br>
> | 99.99th=[13698]<br>
><br>
> bw ( KiB/s): min=23296, max=86432, per=25.00%, avg=26812.24, stdev=3375.69, samples=1252<br>
><br>
> iops : min= 1456, max= 5402, avg=1675.75, stdev=210.98, samples=1252<br>
><br>
> lat (usec) : 100=0.01%, 250=0.01%, 500=0.06%, 750=0.11%, 1000=0.10%<br>
><br>
> lat (msec) : 2=1.12%, 4=7.69%, 10=28.88%, 20=61.95%, 50=0.06%<br>
><br>
> lat (msec) : 100=0.01%<br>
><br>
> cpu : usr=1.56%, sys=7.85%, ctx=1905114, majf=0, minf=56<br>
><br>
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%<br>
><br>
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%<br>
><br>
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%<br>
><br>
> issued rwts: total=3143262,1051042,0,0 short=0,0,0,0 dropped=0,0,0,0<br>
><br>
> latency : target=0, window=0, percentile=100.00%, depth=64<br>
><br>
><br>
><br>
> Run status group 0 (all jobs):<br>
><br>
> READ: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=47.0GiB (51.5GB), run=156806-156806msec<br>
><br>
> WRITE: bw=105MiB/s (110MB/s), 105MiB/s-105MiB/s (110MB/s-110MB/s), io=16.0GiB (17.2GB), run=156806-156806msec<br>
><br>
><br>
><br>
> Disk stats (read/write):<br>
><br>
> dm-4: ios=3455484/1154933, merge=0/0, ticks=35815316/4420412, in_queue=40257384, util=100.00%, aggrios=3456894/1155354, aggrmerge=0/0, aggrticks=35806896/4414972, aggrin_queue=40309192, aggrutil=99.99%<br>
><br>
> dm-2: ios=3456894/1155354, merge=0/0, ticks=35806896/4414972, in_queue=40309192, util=99.99%, aggrios=1728447/577677, aggrmerge=0/0, aggrticks=17902352/2207092, aggrin_queue=20122108, aggrutil=100.00%<br>
><br>
> dm-1: ios=3456894/1155354, merge=0/0, ticks=35804704/4414184, in_queue=40244216, util=100.00%, aggrios=3143273/1051086, aggrmerge=313621/104268, aggrticks=32277972/3937619, aggrin_queue=36289488, aggrutil=100.00%<br>
><br>
> nvme0n1: ios=3143273/1051086, merge=313621/104268, ticks=32277972/3937619, in_queue=36289488, util=100.00%<br>
><br>
> dm-0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%<br>
><br>
> - Second full fio test, nvme device as part of a gluster volume<br>
><br>
> fio-test2: (groupid=0, jobs=4): err= 0: pid=5537: Sat Jan 4 23:30:28 2020<br>
><br>
> read: IOPS=5525, BW=86.3MiB/s (90.5MB/s)(25.3GiB/300002msec)<br>
><br>
> slat (nsec): min=1159, max=894687k, avg=9822.60, stdev=990825.87<br>
><br>
> clat (usec): min=963, max=3141.5k, avg=37455.28, stdev=123109.88<br>
><br>
> lat (usec): min=968, max=3141.5k, avg=37465.21, stdev=123121.94<br>
><br>
> clat percentiles (msec):<br>
><br>
> | 1.00th=[ 7], 5.00th=[ 8], 10.00th=[ 8], 20.00th=[ 9],<br>
><br>
> | 30.00th=[ 9], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 10],<br>
><br>
> | 70.00th=[ 11], 80.00th=[ 12], 90.00th=[ 48], 95.00th=[ 180],<br>
><br>
> | 99.00th=[ 642], 99.50th=[ 860], 99.90th=[ 1435], 99.95th=[ 1687],<br>
><br>
> | 99.99th=[ 2022]<br>
><br>
> bw ( KiB/s): min= 31, max=93248, per=26.30%, avg=23247.24, stdev=20716.86, samples=2280<br>
><br>
> iops : min= 1, max= 5828, avg=1452.92, stdev=1294.81, samples=2280<br>
><br>
> write: IOPS=1850, BW=28.9MiB/s (30.3MB/s)(8676MiB/300002msec); 0 zone resets<br>
><br>
> slat (usec): min=21, max=1586.3k, avg=2117.71, stdev=23082.86<br>
><br>
> clat (usec): min=20, max=2614.0k, avg=23888.03, stdev=99651.34<br>
><br>
> lat (usec): min=225, max=3141.2k, avg=26006.49, stdev=104758.57<br>
><br>
> clat percentiles (usec):<br>
><br>
> | 1.00th=[ 889], 5.00th=[ 2343], 10.00th=[ 3654],<br>
><br>
> | 20.00th=[ 5276], 30.00th=[ 5997], 40.00th=[ 6456],<br>
><br>
> | 50.00th=[ 6849], 60.00th=[ 7177], 70.00th=[ 7504],<br>
><br>
> | 80.00th=[ 7963], 90.00th=[ 8979], 95.00th=[ 74974],<br>
><br>
> | 99.00th=[ 513803], 99.50th=[ 717226], 99.90th=[1333789],<br>
><br>
> | 99.95th=[1518339], 99.99th=[1803551]<br>
><br>
> bw ( KiB/s): min= 31, max=30240, per=27.05%, avg=8009.39, stdev=6912.26, samples=2217<br>
><br>
> iops : min= 1, max= 1890, avg=500.56, stdev=432.02, samples=2217<br>
><br>
> lat (usec) : 50=0.03%, 100=0.02%, 250=0.01%, 500=0.06%, 750=0.08%<br>
><br>
> lat (usec) : 1000=0.11%<br>
><br>
> lat (msec) : 2=0.66%, 4=1.97%, 10=71.07%, 20=14.47%, 50=2.69%<br>
><br>
> lat (msec) : 100=2.23%, 250=3.21%, 500=1.94%, 750=0.82%, 1000=0.31%<br>
><br>
> cpu : usr=0.59%, sys=1.19%, ctx=1172180, majf=0, minf=56<br>
><br>
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%<br>
><br>
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%<br>
><br>
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%<br>
><br>
> issued rwts: total=1657579,555275,0,0 short=0,0,0,0 dropped=0,0,0,0<br>
><br>
> latency : target=0, window=0, percentile=100.00%, depth=64<br>
><br>
><br>
><br>
> Run status group 0 (all jobs):<br>
><br>
> READ: bw=86.3MiB/s (90.5MB/s), 86.3MiB/s-86.3MiB/s (90.5MB/s-90.5MB/s), io=25.3GiB (27.2GB), run=300002-300002msec<br>
><br>
> WRITE: bw=28.9MiB/s (30.3MB/s), 28.9MiB/s-28.9MiB/s (30.3MB/s-30.3MB/s), io=8676MiB (9098MB), run=300002-300002msec<br>
><br></p>
<p dir="ltr">Best Regards,<br>
Strahil Nikolov</p>