[Gluster-users] performance

Tue Aug 4 22:51:59 UTC 2020

На 4 август 2020 г. 22:47:44 GMT+03:00, Computerisms Corporation <bob at computerisms.ca> написа:
>Hi Strahil, thanks for your response.
>
>>>
>>> I have compiled gluster 7.6 from sources on both servers.
>> 
>> There  is a 7.7 version which is fixing somw stuff. Why do you have
>to compile it from source ?
>
>Because I have often found with other stuff in the past compiling from 
>source makes a bunch of problems go away.  software generally works the
>
>way the developers expect it to if you use the sources, so they are 
>better able to help if required.  so now I generally compile most of my
>
>center-piece softwares and use packages for all the supporting stuff.

Hm...  OK. I guess you can try 7.7 whenever it's possible.

>> 
>>> Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and
>gigabit
>>> network connections.  They are running debian, and are being used as
>>> redundant web servers.  There is some 3Million files on the Gluster
>>> Storage averaging 130KB/file.
>> 
>> This type of workload is called 'metadata-intensive'.
>
>does this mean the metadata-cache group file would be a good one to 
>enable?  will try.
>
>waited 10 minutes, no change that I can see.
>
>> There are some recommendations for this type of workload:
>>
>https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements
>> 
>> Keep an eye on the section that mentions dirty-ratio = 5
>&dirty-background-ration = 2.
>
>I have actually read that whole manual, and specifically that page 
>several times.  And also this one:
>
>https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/small_file_performance_enhancements
>
>Perhaps I am not understanding it correctly.  I tried these suggestions
>
>before and it got worse, not better.  so I have been operating under
>the 
>assumption that maybe these guidelines are not appropriate for newer 
>versions.

Actually, the  settings are not changed  much,  so they should work for you.

>But will try again.  adjusting the dirty ratios.
>
>Load average went from around 15 to 35 in about 2-3 minutes, but 20 
>minutes later, it is back down to 20.  It may be having a minimal 
>positive impact on cpu, though, I haven't see the main glusterfs go
>over 
>200% since I changed this, an the brick processes are hovering just 
>below 50%  where they were consistently above 50% before.  Might just
>be 
>time of day with the system not as busy.
>
>after watching for 30 minutes, load average is fluctuating between 10 
>and 30, but cpu idle appears marginally better on average than it was.
>
>>> Interestingly, mostly because it is not something I have ever
>>> experienced before, software interrupts sit between 1 and 5 on each
>>> core, but the last core is usually sitting around 20.  Have never
>>> encountered a high load average where the si number was ever
>>> significant.  I have googled the crap out of that (as well as
>gluster
>>> performance in general), there are nearly limitless posts about what
>it
>>>
>>> is, but have yet to see one thing to explain what to do about it.

This is happening on all nodes ?
I got a similar situation caused by bad NIC  (si in top was way high), but the chance for bad NIC on all servers is very low.
You can still patch OS + Firmware on your next maintenance.

>> There is an explanation  about that in the link I provided above:
>> 
>> Configuring a higher event threads value than the available
>processing units could again cause context switches on these threads.
>As a result reducing the number deduced from the previous step to a
>number that is less that the available processing units is recommended.
>
>Okay, again, have played with these numbers before and it did not pan 
>out as expected.  if I understand it correctly, I have 3 brick
>processes 
>(glusterfsd), so the "deduced" number should be 3, and I should set it 
>lower than that, so 2.  but it also says "If a specific thread consumes
>
>more number of CPU cycles than needed, increasing the event thread
>count 
>would enhance the performance of the Red Hat Storage Server."  which is
>
>why I had it at 8.

Yeah, but you got only 6 cores  and they are not dedicated for gluster only. I think that you need to test with lower values.

>but will set it to 2 now.  load average is at 17 to start, waiting a 
>while to see what happens.
>
>so 15 minutes later, load average is currently 12, but is fluctuating 
>between 10 and 20, have seen no significant change in cpu usage or 
>anything else in top.
>
>now try also changing server.outstanding-rpc-limit to 256 and wait.
>
>15 minutes later; load has been above 30 but is currently back down to 
>12.  no significant change in cpu.  try increasing to 512 and wait.
>
>15 minutes later, load average is 50.  no signficant difference in cpu.
>
>Software interrupts remain around where they were.  wa from top remains
>
>about where it was.  not sure why load average is climbing so high. 
>changing rpc-limit to 128.
>
>ugh.  10 minutes later, load average just popped over 100.  resetting 
>rpc-limit.
>
>now trying cluster.lookup-optimize on, lazy rebalancing (probably a bad
>
>idea on the live system, but how much worse can it get?)  Ya, bad idea,
>
>80 hours estimated to complete, load is over 50 and server is crawling.
>
>disabling rebalance and turning lookup-optimize off, for now.
>
>right now the only suggested parameter I haven't played with is the 
>performance.io-thread-count, which I currently have at 64.

I think that as you have SSDs only,  you might have some results by changing this one.

>sigh.  an hour later load average is 80 and climbing.  apache processes
>
>are numbering in the hundreds and I am constantly having to restart it.
>
>this brings load average down to 5, but as apache processes climb and 
>are held open load average gets up to over 100 again with 3-4 minutes, 
>and system starts going non-responsive.  rinse and repeat.
>
>so followed all the recommendations, maybe the dirty settings had a 
>small positive impact, but overall system is most definitely worse for 
>having made the changes.
>
>I have returned the configs back to how they were except the dirty 
>settings and the metadata-cache group.  increased
>performance.cache-size 
>to 16GB for now, because that is the one thing that seems to help when
>I 
>"tune" (aka make worse) the system.  have had to restart apache a
>couple 
>dozen times or more, but after another 30 minutes or so system has 
>pretty much settled back to how it was before I started.  cpu is like I
>
>originally stated, all 6 cores maxed out most of the time, software 
>interrupts still have all cpus running around 5 with the last one 
>consistently sitting around 20-25.  Disk is busy but not usually maxed 
>out.  RAM is about half used.  network load peaks at about 1/3
>capacity. 
>load average is between 10 and 20.  sites are responding, but sluggish.
>
>so am I not reading these recommendations and following the
>instructions 
>correctly?  am I not waiting long enough after each implementation, 
>should I be making 1 change per day instead of thinking 15 minutes 
>should be enough for the system to catch up?  I have read the full red 
>hat documentation and the significant majority of the gluster docs, 
>maybe I am missing something else there?  should these settings have
>had 
>a different effect than they did?
>
>For what it's worth, I am running ext4 as my underlying fs and I have 
>read a few times that XFS might have been a better choice.  But that is
>
>not a trivial experiment to make at this time with the system in 
>production.  It's one thing (and still a bad thing to be sure) to 
>semi-bork the system for an hour or two while I play with 
>configurations, but would take a day or so offline to reformat and 
>restore the data.

XFS  should bring better performance, but if the issue is not in FS ->  it won't make  a change...
What I/O scheduler are you using for the SSDs (you can check via 'cat /sys/block/sdX/queue/scheduler)?
>> 
>> As 'storage.fips-mode-rchecksum' is using sha256, you can try to
>disable it - which should use the less cpu intensive md5. Yet, I have
>never played with that option ...
>
>Done.  no signficant difference than I can see.
>
>> Check the RH page about the tunings and try different values  for the
>event threads.
>
>in the past I have tried 2, 4, 8, 16, and 32.  Playing with just those
>I 
>never noticed that any of them made any difference.  Though I might
>have 
>some different options now than I did then, so might try these again 
>throughout the day...

Are you talking about server or client event threads (or both)?

>Thanks again for your time Strahil, if you have any more thoughts would
>
>love to hear them.

Can you check if you use 'noatime' for the bricks ? It won't bring any effect on the CPU side, but it might help with the I/O.

I see that your indicator for high load  is loadavg,  but have you actually checked how many processes are in 'R' or 'D' state ?
Some  monitoring checks can raise loadavg artificially.

Also,  are you using software mirroring (either mdadm or striped/mirrored LVs )?

>> 
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users at gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users