Optimization

nullplan · Post by **nullplan** » Fri Oct 21, 2022 11:59 pm

The point is, when you put it all together, that the Amiga was a very different machine from modern PCs and cache friendliness was not a consideration. So choosing to go for linked lists was a very different decision there. These days, self-referential data structures are just one cache miss after the next.

That said, using dynamic arrays all over the place is disastrous for other reasons (like memory fragmentation). As I said in the beginning, there is no silver bullet. There are many choices, and there are pros and cons to all of them. If your array goes beyond the cache size, it too becomes one cache miss after the next, especially if you access it randomly. And allocating an array when the key space is large and sparsely populated is wasteful.

kzinti · Post by **kzinti** » Sat Oct 22, 2022 2:04 am

I am sure this discussion is very helpful to the OP.

nexos · Post by **nexos** » Sat Oct 22, 2022 7:03 am

kzinti wrote:I am sure this discussion is very helpful to the OP.

You must admit that the OP had a rather, er, I'll be diplomatic and say overly-broad, question. That's just asking for someone to go off-topic.

I'll admit I'm enjoying the discussion about the Amiga's cache more than the OP, as the OP's answer could be discovered with knowledge of CS fundamentals.

devc1 · Post by **devc1** » Sat Oct 22, 2022 1:31 pm

So, Optimization at the micro level (cpu cache, registers, smaller and aligned data) will be helpfull for all kinds of tasks. It may achieve quadriples of performance and even more looking at some simple tests I made.

How about things such as task switching, can you get afaster register save/restore in long mode with some instruction ?

The fxsave saves the state of MMX, FPU and SSE.

What about the xsave, xsaves, xsaveopt. I guess they save the AVX state but do they also save the registers ?

Should I in 64 bit mode save registers one by one, isn't there any better way ?

nexos · Post by **nexos** » Sun Oct 23, 2022 3:20 pm

devc1 wrote:So, Optimization at the micro level (cpu cache, registers, smaller and aligned data) will be helpfull for all kinds of tasks. It may achieve quadriples of performance and even more looking at some simple tests I made.

No. Micro-optimization is always optional. Beneficial, yes, but not necessary.

devc1 wrote:How about things such as task switching, can you get afaster register save/restore in long mode with some instruction ?

No. Just several good ole' pushes.

devc1 wrote:The fxsave saves the state of MMX, FPU and SSE.

Yes.

devc1 wrote:What about the xsave, xsaves, xsaveopt. I guess they save the AVX state but do they also save the registers ?

Read the Intel manuals

?

devc1 wrote:Should I in 64 bit mode save registers one by one, isn't there any better way ?

It's the only way.

Octocontrabass · Post by **Octocontrabass** » Sun Oct 23, 2022 4:39 pm

nexos wrote:No. Just several good ole' pushes.

There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.

eekee · Post by **eekee** » Mon Oct 24, 2022 7:10 am

Octocontrabass wrote:
nexos wrote:No. Just several good ole' pushes.
There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.

I recall people preferring LEA over SUB for manipulating ESP, but it was a long time since I read it and I'm not sure why. LEA doesn't change the flags and lets you put the result in a different register, but the latter especially isn't relevant. Address computations use a separate unit so it may have improved parallelization in some older processors. Maybe it's obsolete, maybe not.

nexos · Post by **nexos** » Mon Oct 24, 2022 11:27 am

Octocontrabass wrote:
nexos wrote:No. Just several good ole' pushes.
There are some microarchitectures where using SUB and several MOV will be faster than several PUSH if the increased code size doesn't cause additional cache misses. So, there is another way, but it's probably not a better way.

I think I recall seeing that method in TempleOS now that you mention it.

OSDev.org

Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization