[Gluster-users] performance

Fri Aug 21 05:32:50 UTC 2020

На 20 август 2020 г. 3:46:41 GMT+03:00, Computerisms Corporation <bob at computerisms.ca> написа:
>Hi Strahil,
>
>so over the last two weeks, the system has been relatively stable.  I 
>have powered off both servers at least once, for about 5 minutes each 
>time.  server came up, auto-healed what it needed to, so all of that 
>part is working as expected.
>
>will answer things inline and follow with more questions:
>
>>>> Hm...  OK. I guess you can try 7.7 whenever it's possible.
>>>
>>> Acknowledged.
>
>Still on my list.
>> It could be a bad firmware also. If you get the opportunity,  flash
>the firmware and bump the OS to the max.
>
>Datacenter says everything was up to date as of installation, not
>really 
>wanting them to take the servers offline for long enough to redo all
>the 
>hardware.
>
>>>>> more number of CPU cycles than needed, increasing the event thread
>>>>> count
>>>>> would enhance the performance of the Red Hat Storage Server." 
>which
>>> is
>>>>> why I had it at 8.
>>>> Yeah, but you got only 6 cores  and they are not dedicated for
>>> gluster only. I think that you need to test with lower values.
>
>figured out my magic number for client/server threads, it should be 5. 
>I set it to 5, observed no change I could attribute to it, so tried 4, 
>and got the same thing; no visible effect.
>
>>>>> right now the only suggested parameter I haven't played with is
>the
>>>>> performance.io-thread-count, which I currently have at 64.
>>> not really sure what would be a reasonable value for my system.
>> I guess you can try to increase it a little bit and check how is it
>going.
>
>turns out if you try to set this higher than 64, you get an error
>saying 
>64 is the max.
>
>>>> What I/O scheduler are you using for the SSDs (you can check via
>'cat
>>> /sys/block/sdX/queue/scheduler)?
>>>
>>> # cat /sys/block/vda/queue/scheduler
>>> [mq-deadline] none
>> 
>> Deadline prioritizes  reads in a 2:1 ratio /default tunings/ . You
>can consider testing 'none' if your SSDs are good.
>
>I did this.  I would say it did have a positive effect, but it was a 
>minimal one.
>
>> I see vda , please share details on the infra as this is very
>important. Virtual disks have their limitations and if you are on a VM,
> then there might be chance to increase the CPU count.
>> If you are on a VM, I would recommend you to use more (in numbers) 
>and smaller disks in stripe sets (either raid0  via mdadm,  or pure
>striped LV).
>> Also, if you are  on a VM -> there is no reason to reorder  your I/O
>requests  in the VM, just to do  it again on the Hypervisour. In such
>case 'none' can bring better performance,  but this varies on the
>workload.
>
>hm, this is a good question, one I have been asking the datacenter for
>a 
>while, but they are a little bit slippery on what exactly it is they 
>have going on there.  They advertise the servers as metal with a
>virtual 
>layer.  The virtual layer is so you can log into a site and power the 
>server down or up, mount an ISO to boot from, access a console, and
>some 
>other nifty things.  can't any more, but when they first introduced the
>
>system, you could even access the BIOS of the server.  But apparently, 
>and they swear up and down by this, it is a physical server, with real 
>dedicated SSDs and real sticks of RAM.  I have found virtio and qemu as
>
>loaded kernel modules, so certainly there is something virtual
>involved, 
>but other than that and their nifty little tools, it has always acted 
>and worked like a metal server to me.

You can use 'virt-what' binary to find if and what type of Virtualization is used.
I have a suspicion you are ontop of Openstack (which uses CEPH), so I guess you can try to get more  info.
For example, an Openstack instance can have '0x1af4' in '/sys/block/vdX/device/vendor' (replace X with actual device letter).
Another check could be:
/usr/lib/udev/scsi_id -g -u -d /dev/vda

And also, you can try to take a look with smartctl from smartmontools package:
smartctl -a /dev/vdX

>> All necessary data is in the file attributes on the brick. I doubt
>you will need to have access times on the brick itself. Another
>possibility is to use 'relatime'.
>
>remounted all bricks with noatime, no significant difference.
>
>>> cache unless flush-behind is on.  So seems that is a way to throw
>ram
>>> to
>>> it?  I put performance.write-behind-window-size: 512MB and
>>> performance.flush-behind: on and the whole system calmed down pretty
>>> much immediately.  could be just timing, though, will have to see
>>> tomorrow during business hours whether the system stays at a
>reasonable
>
>Tried increasing this to its max of 1GB, no noticeable change from
>512MB.
>
>The 2nd server is not acting inline with the first server.  glusterfsd 
>processes are running at 50-80% of a core each, with one brick often 
>going over 200%, where as they usually stick to 30-45% on the first 
>server.  apache processes consume as much as 90% of a core where as
>they 
>rarely go over 15% on the first server, and they frequently stack up to
>
>having more than 100 running at once, which drives load average up to 
>40-60.  It's very much like the first server was before I found the 
>flush-behind setting, but not as bad; at least it isn't going
>completely 
>non-responsive.
>
>Additionally, it is still taking an excessive time to load the first 
>page of most sites.  I am guessing I need to increase read speeds to
>fix 
>this, so I have played with 
>performance.io-cache/cache-max-file-size(slight positive change), 
>read-ahead/read-ahead-page-count(negative change till page count set to
>
>max of 16, then no noticeable difference), and 
>rda-cache-limit/rda-request-size(minimal positive effect).  I still
>have 
>RAM to spare, so would be nice if I could be using it to improve things

>on the read side of things, but have found no magic bullet like 
>flush-behind was.
>
>I found a good number of more options to try, have been going a little 
>crazy with them, will post them at the bottom.  I found a post that 
>suggested mount options are also important:
>
>https://lists.gluster.org/pipermail/gluster-users/2018-September/034937.html
>
>I confirmed these are in the man pages, so I tried umounting and 
>re-mounting with the -o option to include these thusly:
>
>mount -t glusterfs moogle:webisms /Computerisms/ -o 
>negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5
>
>But I don't think they are working:
>
>/# mount | grep glus
>moogle:webisms on /Computerisms type fuse.glusterfs 
>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>would be grateful if there are any other suggestions anyone can think
>of.
>
>root at moogle:/# gluster v info
>
>Volume Name: webisms
>Type: Distributed-Replicate
>Volume ID: 261901e7-60b4-4760-897d-0163beed356e
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 2 x (2 + 1) = 6
>Transport-type: tcp
>Bricks:
>Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb 
>(arbiter)
>Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb 
>(arbiter)
>Options Reconfigured:
>performance.rda-cache-limit: 1GB
>performance.client-io-threads: off
>nfs.disable: on
>storage.fips-mode-rchecksum: off
>transport.address-family: inet
>performance.stat-prefetch: on
>network.inode-lru-limit: 200000
>performance.write-behind-window-size: 1073741824
>performance.readdir-ahead: on
>performance.io-thread-count: 64
>performance.cache-size: 12GB
>server.event-threads: 4
>client.event-threads: 4
>performance.nl-cache-timeout: 600
>auth.allow: xxxxxx
>performance.open-behind: off
>performance.quick-read: off
>cluster.lookup-optimize: off
>cluster.rebal-throttle: lazy
>features.cache-invalidation: on
>features.cache-invalidation-timeout: 600
>performance.cache-invalidation: on
>performance.md-cache-timeout: 600
>performance.flush-behind: on
>cluster.read-hash-mode: 0
>performance.strict-o-direct: on
>cluster.readdir-optimize: on
>cluster.lookup-unhashed: off
>performance.cache-refresh-timeout: 30
>performance.enable-least-priority: off
>cluster.choose-local: on
>performance.rda-request-size: 128KB
>performance.read-ahead: on
>performance.read-ahead-page-count: 16
>performance.cache-max-file-size: 5MB
>performance.io-cache: on