[Gluster-users] performance
Strahil Nikolov
hunter86_bg at yahoo.com
Wed Aug 5 21:07:53 UTC 2020
На 5 август 2020 г. 4:53:34 GMT+03:00, Computerisms Corporation <bob at computerisms.ca> написа:
>Hi Strahil,
>
>thanks again for sticking with me on this.
>> Hm... OK. I guess you can try 7.7 whenever it's possible.
>
>Acknowledged.
>
>>> Perhaps I am not understanding it correctly. I tried these
>suggestions
>>>
>>> before and it got worse, not better. so I have been operating under
>>> the
>>> assumption that maybe these guidelines are not appropriate for newer
>>> versions.
>>
>> Actually, the settings are not changed much, so they should work
>for you.
>
>Okay, then maybe I am doing something incorrectly, or not understanding
>
>some fundamental piece of things that I should be.
To be honest, the documentation seems pretty useless to me.
>>>>> Interestingly, mostly because it is not something I have ever
>>>>> experienced before, software interrupts sit between 1 and 5 on
>each
>>>>> core, but the last core is usually sitting around 20. Have never
>>>>> encountered a high load average where the si number was ever
>>>>> significant. I have googled the crap out of that (as well as
>>> gluster
>>>>> performance in general), there are nearly limitless posts about
>what
>>> it
>>>>>
>>>>> is, but have yet to see one thing to explain what to do about it.
>>
>> This is happening on all nodes ?
>> I got a similar situation caused by bad NIC (si in top was way
>high), but the chance for bad NIC on all servers is very low.
>> You can still patch OS + Firmware on your next maintenance.
>
>Yes, but it's not to the same extreme. The other node is currently not
>
>actually serving anything to the internet, so right now it's only
>function is replicated gluster and databases. On the 2nd node there is
>
>also one core, the first one in this case as opposed to the last one on
>
>the main node, but it sits between 10 and 15 instead of 20 and 25, and
>the remaining cores will be between 0 and 2 instead of 1 and 5.
>I have no evidence of any bad hardware, and these servers were both
>commissioned only within the last couple of months. But will still
>poke
>around on this path.
It could be a bad firmware also. If you get the opportunity, flash the firmware and bump the OS to the max.
>>> more number of CPU cycles than needed, increasing the event thread
>>> count
>>> would enhance the performance of the Red Hat Storage Server." which
>is
>>>
>>> why I had it at 8.
>>
>> Yeah, but you got only 6 cores and they are not dedicated for
>gluster only. I think that you need to test with lower values.
>
>Okay, I will change these values a few times over the next couple of
>hours and see what happens.
>
>>> right now the only suggested parameter I haven't played with is the
>>> performance.io-thread-count, which I currently have at 64.
>>
>> I think that as you have SSDs only, you might have some results by
>changing this one.
>
>Okay, will also modify this incrementally. do you think it can go
>higher? I think I got this number from a thread on this list, but I am
>
>not really sure what would be a reasonable value for my system.
I guess you can try to increase it a little bit and check how is it going.
>>>
>>> For what it's worth, I am running ext4 as my underlying fs and I
>have
>>> read a few times that XFS might have been a better choice. But that
>is
>>>
>>> not a trivial experiment to make at this time with the system in
>>> production. It's one thing (and still a bad thing to be sure) to
>>> semi-bork the system for an hour or two while I play with
>>> configurations, but would take a day or so offline to reformat and
>>> restore the data.
>>
>> XFS should bring better performance, but if the issue is not in FS
>-> it won't make a change...
>> What I/O scheduler are you using for the SSDs (you can check via 'cat
>/sys/block/sdX/queue/scheduler)?
>
># cat /sys/block/vda/queue/scheduler
>[mq-deadline] none
Deadline prioritizes reads in a 2:1 ratio /default tunings/ . You can consider testing 'none' if your SSDs are good.
I see vda , please share details on the infra as this is very important. Virtual disks have their limitations and if you are on a VM, then there might be chance to increase the CPU count.
If you are on a VM, I would recommend you to use more (in numbers) and smaller disks in stripe sets (either raid0 via mdadm, or pure striped LV).
Also, if you are on a VM -> there is no reason to reorder your I/O requests in the VM, just to do it again on the Hypervisour. In such case 'none' can bring better performance, but this varies on the workload.
>>> in the past I have tried 2, 4, 8, 16, and 32. Playing with just
>those
>>> I
>>> never noticed that any of them made any difference. Though I might
>>> have
>>> some different options now than I did then, so might try these again
>>> throughout the day...
>>
>> Are you talking about server or client event threads (or both)?
>
>It never occurred to me to set them to different values. so far when I
>
>set one I set the other to the same value.
Yeah, this makes sense.
>>
>>> Thanks again for your time Strahil, if you have any more thoughts
>would
>>>
>>> love to hear them.
>>
>> Can you check if you use 'noatime' for the bricks ? It won't bring
>any effect on the CPU side, but it might help with the I/O.
>
>I checked into this, and I have nodiratime set, but not noatime. from
>what I can gather, it should provide nearly the same benefit
>performance
>wise while leaving the atime attribute on the files. Never know, I may
>
>decide I want those at some point in the future.
All necessary data is in the file attributes on the brick. I doubt you will need to have access times on the brick itself. Another possibility is to use 'relatime'.
>> I see that your indicator for high load is loadavg, but have you
>actually checked how many processes are in 'R' or 'D' state ?
>> Some monitoring checks can raise loadavg artificially.
>
>occasionally a batch of processes will be in R state, and I see the D
>state show up from time to time, but mostly everything is S.
>
>> Also, are you using software mirroring (either mdadm or
>striped/mirrored LVs )?
>
>No, single disk. And I opted to not put the gluster on a thinLVM, as I
>
>don't see myself using the lvm snapshots in this scenario.
>
>So, we just moved into a quieter time of the day, but maybe I just
>stumbled onto something. I was trying to figure out if/how I could
>throw more RAM at the problem. gluster docs says write behind is not a
>
>cache unless flush-behind is on. So seems that is a way to throw ram
>to
>it? I put performance.write-behind-window-size: 512MB and
>performance.flush-behind: on and the whole system calmed down pretty
>much immediately. could be just timing, though, will have to see
>tomorrow during business hours whether the system stays at a reasonable
>
>load.
>
>I will still test the other options you suggested tonight, though, this
>
>is probably too good to be true.
>
>Can't thank you enough for your input, Strahil, your help is truly
>appreciated!
>
>
>
>
>
>
>>
>>>>
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users at gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list