[Gluster-users] performance

Wed Aug 5 21:07:53 UTC 2020

На 5 август 2020 г. 4:53:34 GMT+03:00, Computerisms Corporation <bob at computerisms.ca> написа:
>Hi Strahil,
>
>thanks again for sticking with me on this.
>> Hm...  OK. I guess you can try 7.7 whenever it's possible.
>
>Acknowledged.
>
>>> Perhaps I am not understanding it correctly.  I tried these
>suggestions
>>>
>>> before and it got worse, not better.  so I have been operating under
>>> the
>>> assumption that maybe these guidelines are not appropriate for newer
>>> versions.
>> 
>> Actually, the  settings are not changed  much,  so they should work
>for you.
>
>Okay, then maybe I am doing something incorrectly, or not understanding
>
>some fundamental piece of things that I should be.

To be honest, the documentation seems pretty useless to me.

>>>>> Interestingly, mostly because it is not something I have ever
>>>>> experienced before, software interrupts sit between 1 and 5 on
>each
>>>>> core, but the last core is usually sitting around 20.  Have never
>>>>> encountered a high load average where the si number was ever
>>>>> significant.  I have googled the crap out of that (as well as
>>> gluster
>>>>> performance in general), there are nearly limitless posts about
>what
>>> it
>>>>>
>>>>> is, but have yet to see one thing to explain what to do about it.
>> 
>> This is happening on all nodes ?
>> I got a similar situation caused by bad NIC  (si in top was way
>high), but the chance for bad NIC on all servers is very low.
>> You can still patch OS + Firmware on your next maintenance.
>
>Yes, but it's not to the same extreme.  The other node is currently not
>
>actually serving anything to the internet, so right now it's only 
>function is replicated gluster and databases.  On the 2nd node there is
>
>also one core, the first one in this case as opposed to the last one on
>
>the main node, but it sits between 10 and 15 instead of 20 and 25, and 
>the remaining cores will be between 0 and 2 instead of 1 and 5.

>I have no evidence of any bad hardware, and these servers were both 
>commissioned only within the last couple of months.  But will still
>poke 
>around on this path.

It could be a bad firmware also. If you get the opportunity,  flash the firmware and bump the OS to the max.

>>> more number of CPU cycles than needed, increasing the event thread
>>> count
>>> would enhance the performance of the Red Hat Storage Server."  which
>is
>>>
>>> why I had it at 8.
>> 
>> Yeah, but you got only 6 cores  and they are not dedicated for
>gluster only. I think that you need to test with lower values.
>
>Okay, I will change these values a few times over the next couple of 
>hours and see what happens.
>
>>> right now the only suggested parameter I haven't played with is the
>>> performance.io-thread-count, which I currently have at 64.
>> 
>> I think that as you have SSDs only,  you might have some results by
>changing this one.
>
>Okay, will also modify this incrementally.  do you think it can go 
>higher?  I think I got this number from a thread on this list, but I am
>
>not really sure what would be a reasonable value for my system.

I guess you can try to increase it a little bit and check how is it going.

>>>
>>> For what it's worth, I am running ext4 as my underlying fs and I
>have
>>> read a few times that XFS might have been a better choice.  But that
>is
>>>
>>> not a trivial experiment to make at this time with the system in
>>> production.  It's one thing (and still a bad thing to be sure) to
>>> semi-bork the system for an hour or two while I play with
>>> configurations, but would take a day or so offline to reformat and
>>> restore the data.
>> 
>> XFS  should bring better performance, but if the issue is not in FS
>->  it won't make  a change...
>> What I/O scheduler are you using for the SSDs (you can check via 'cat
>/sys/block/sdX/queue/scheduler)?
>
># cat /sys/block/vda/queue/scheduler
>[mq-deadline] none

Deadline prioritizes  reads in a 2:1 ratio /default tunings/ . You can consider testing 'none' if your SSDs are good.

I see vda , please share details on the infra as this is very important. Virtual disks have their limitations and if you are on a VM,  then there might be chance to increase the CPU count.
If you are on a VM, I would recommend you to use more (in numbers)  and smaller disks in stripe sets (either raid0  via mdadm,  or pure striped LV).
Also, if you are  on a VM -> there is no reason to reorder  your I/O requests  in the VM, just to do  it again on the Hypervisour. In such case 'none' can bring better performance,  but this varies on the workload.

>>> in the past I have tried 2, 4, 8, 16, and 32.  Playing with just
>those
>>> I
>>> never noticed that any of them made any difference.  Though I might
>>> have
>>> some different options now than I did then, so might try these again
>>> throughout the day...
>> 
>> Are you talking about server or client event threads (or both)?
>
>It never occurred to me to set them to different values.  so far when I
>
>set one I set the other to the same value.

Yeah, this makes sense.

>> 
>>> Thanks again for your time Strahil, if you have any more thoughts
>would
>>>
>>> love to hear them.
>> 
>> Can you check if you use 'noatime' for the bricks ? It won't bring
>any effect on the CPU side, but it might help with the I/O.
>
>I checked into this, and I have nodiratime set, but not noatime.  from 
>what I can gather, it should provide nearly the same benefit
>performance 
>wise while leaving the atime attribute on the files.  Never know, I may
>
>decide I want those at some point in the future.

All necessary data is in the file attributes on the brick. I doubt you will need to have access times on the brick itself. Another possibility is to use 'relatime'.

>> I see that your indicator for high load  is loadavg,  but have you
>actually checked how many processes are in 'R' or 'D' state ?
>> Some  monitoring checks can raise loadavg artificially.
>
>occasionally a batch of processes will be in R state, and I see the D 
>state show up from time to time, but mostly everything is S.
>
>> Also,  are you using software mirroring (either mdadm or
>striped/mirrored LVs )?
>
>No, single disk.  And I opted to not put the gluster on a thinLVM, as I
>
>don't see myself using the lvm snapshots in this scenario.
>
>So, we just moved into a quieter time of the day, but maybe I just 
>stumbled onto something.  I was trying to figure out if/how I could 
>throw more RAM at the problem.  gluster docs says write behind is not a
>
>cache unless flush-behind is on.  So seems that is a way to throw ram
>to 
>it?  I put performance.write-behind-window-size: 512MB and 
>performance.flush-behind: on and the whole system calmed down pretty 
>much immediately.  could be just timing, though, will have to see 
>tomorrow during business hours whether the system stays at a reasonable
>
>load.
>
>I will still test the other options you suggested tonight, though, this
>
>is probably too good to be true.
>
>Can't thank you enough for your input, Strahil, your help is truly 
>appreciated!
>
>
>
>
>
>
>> 
>>>>
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users at gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users