Optimization

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: Optimization

Post by nullplan »

The point is, when you put it all together, that the Amiga was a very different machine from modern PCs and cache friendliness was not a consideration. So choosing to go for linked lists was a very different decision there. These days, self-referential data structures are just one cache miss after the next.

That said, using dynamic arrays all over the place is disastrous for other reasons (like memory fragmentation). As I said in the beginning, there is no silver bullet. There are many choices, and there are pros and cons to all of them. If your array goes beyond the cache size, it too becomes one cache miss after the next, especially if you access it randomly. And allocating an array when the key space is large and sparsely populated is wasteful.
Carpe diem!
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Optimization

Post by kzinti »

I am sure this discussion is very helpful to the OP.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

kzinti wrote:I am sure this discussion is very helpful to the OP.
You must admit that the OP had a rather, er, I'll be diplomatic and say overly-broad, question. That's just asking for someone to go off-topic.

I'll admit I'm enjoying the discussion about the Amiga's cache more than the OP, as the OP's answer could be discovered with knowledge of CS fundamentals.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

So, Optimization at the micro level (cpu cache, registers, smaller and aligned data) will be helpfull for all kinds of tasks. It may achieve quadriples of performance and even more looking at some simple tests I made.

How about things such as task switching, can you get afaster register save/restore in long mode with some instruction ?

The fxsave saves the state of MMX, FPU and SSE.

What about the xsave, xsaves, xsaveopt. I guess they save the AVX state but do they also save the registers ?

Should I in 64 bit mode save registers one by one, isn't there any better way ?
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

devc1 wrote:So, Optimization at the micro level (cpu cache, registers, smaller and aligned data) will be helpfull for all kinds of tasks. It may achieve quadriples of performance and even more looking at some simple tests I made.
No. Micro-optimization is always optional. Beneficial, yes, but not necessary.
devc1 wrote:How about things such as task switching, can you get afaster register save/restore in long mode with some instruction ?
No. Just several good ole' pushes.
devc1 wrote:The fxsave saves the state of MMX, FPU and SSE.
Yes.
devc1 wrote:What about the xsave, xsaves, xsaveopt. I guess they save the AVX state but do they also save the registers ?
Read the Intel manuals :wink: ?
devc1 wrote:Should I in 64 bit mode save registers one by one, isn't there any better way ?
It's the only way.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Optimization

Post by Octocontrabass »

nexos wrote:No. Just several good ole' pushes.
There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.
User avatar
eekee
Member
Member
Posts: 891
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Optimization

Post by eekee »

Octocontrabass wrote:
nexos wrote:No. Just several good ole' pushes.
There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.
I recall people preferring LEA over SUB for manipulating ESP, but it was a long time since I read it and I'm not sure why. LEA doesn't change the flags and lets you put the result in a different register, but the latter especially isn't relevant. Address computations use a separate unit so it may have improved parallelization in some older processors. Maybe it's obsolete, maybe not.
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

Octocontrabass wrote:
nexos wrote:No. Just several good ole' pushes.
There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.
I think I recall seeing that method in TempleOS now that you mention it.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Post Reply