Optimization

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

I am using linked lists for heaps and threads.
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Optimization

Post by Octocontrabass »

Why do you need to search for allocated items in your heap?
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

You need to maintain a list of heap entries so that the free function will only free a particular area of memory. And the allocate function will go search free memory in these heap entries ?
Octocontrabass
Member
Member
Posts: 5563
Joined: Mon Mar 25, 2013 7:01 pm

Re: Optimization

Post by Octocontrabass »

Why does the free function need to search the bitmap?
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

devc1 wrote:I may create multiple bitmaps to fit the list in a page where I can benefit from the tlb cache, and I make sure that the list is page aligned.
What does the TLB have to do with this?
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: Optimization

Post by nullplan »

devc1 wrote:I guess the quicksort algorithm will be usefull when creating linked lists
Let me stop you there. No. Quicksort is horrible for lists. Mergesort is better suited for lists. However...
devc1 wrote:with an ID for each item, so you can use this algorithm to get the Item from the ID faster ?
What you have described there in a roundabout way is a map. You want a structure in which it is easy to look up an item by its ID. A sorted list will not help you here, as you still need to iterate over the preceding elements to find out where each successor is. A better solution would be a hash map, if you know the maximum size the map can ever be, or else you don't need to resize it too often (that is the slowest operation in a hash map). Otherwise, a tree map can be used (i.e. a self-balancing binary tree data structure, like a red-black tree, or an Anderson tree).

Another difference is that certain types of hash maps are very cache friendly, whereas all self-referential data structures risk a cache miss on every dereference. And cache misses these days are about on par with NVMe accesses.

These are all Computer Science fundamentals. I suggest you learn a lot more about these topics before moving on. Just search for "data structures", and you should be inundated with stuff to read. This is fairly important, because most work in a kernel depends on good data structures. Indeed choosing the right data structure for the job can be the most important optimization you can make. And that does mean you actually need to learn about complexity theory while you are studying data structures.
Carpe diem!
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

What does the TLB have to do with this?
If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.
Why does the free function need to search the bitmap?
In order to allocate a heap entry structure in a linked list, instead of iterrating over each heap to check if its present (which accesses memory, which makes the function slower), I can just make a bitmap that represents each present item, and use "bsf" BitScanForward to scan for a free slot. It made my linked list implementation more than 20 times faster.

A list will be something like this :

Code: Select all

typedef struct {
UINT64 Bitmap;
Entry entries[64];
list* Next;
} list;
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

devc1 wrote:If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.
OK, but I still don't know what that has to with anything. When allocating a page for the bitmaps, the pages have to be aligned. There's no other option.

As for general optimization, making good use of the CPU's caches is very important in micro-optimization.

And no, I'm not talking about the registers, I'm talking about a block of SRAM inside the CPU that is much faster to access than the system's DRAM. The CPU tries to keep the most useful information inside the cache. But programs / OSes can help the CPU a lot with this.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.

Will the prefetch instruction help with something ?
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Optimization

Post by nexos »

devc1 wrote:Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.
Exactly why you shouldn't be using a linked list.
devc1 wrote:The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
What are you trying to say by this?
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
devc1
Member
Member
Posts: 439
Joined: Fri Feb 11, 2022 4:55 am
Location: behind the keyboard

Re: Optimization

Post by devc1 »

So what I use instead of a linked list ?

A binary tree ?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Optimization

Post by Solar »

Just as a note, AmigaOS Exec went the exact other way -- "everything" was a linked list. Kernel objects were basically all derived from "list node". Tasks, devices, memory hunks. All handled by the identical list handling code, which the kernel exposed to users as well so they could use that for their own data. As opposed to more efficient data structures, this held the code handling those data structures in the CPU cache, because it was the same code for "everything".
Every good solution is obvious once you've found it.
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Optimization

Post by iansjack »

I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Optimization

Post by thewrongchristian »

iansjack wrote:I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?
68020 had an instruction cache. I think that was available with the Amiga 1200.
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Optimization

Post by iansjack »

The 68EC020 in the Amiga 1200 did have a 256 byte instruction cache that wasn’t present in the original 68000 used in earlier Amigas.
Post Reply