Optimization

devc1 · Post by **devc1** » Wed Oct 19, 2022 1:02 pm

I am using linked lists for heaps and threads.

Octocontrabass · Post by **Octocontrabass** » Wed Oct 19, 2022 1:39 pm

Why do you need to search for allocated items in your heap?

devc1 · Post by **devc1** » Wed Oct 19, 2022 2:22 pm

You need to maintain a list of heap entries so that the free function will only free a particular area of memory. And the allocate function will go search free memory in these heap entries ?

Octocontrabass · Post by **Octocontrabass** » Wed Oct 19, 2022 2:36 pm

Why does the free function need to search the bitmap?

nexos · Post by **nexos** » Wed Oct 19, 2022 3:37 pm

devc1 wrote:I may create multiple bitmaps to fit the list in a page where I can benefit from the tlb cache, and I make sure that the list is page aligned.

What does the TLB have to do with this?

nullplan · Post by **nullplan** » Wed Oct 19, 2022 10:11 pm

devc1 wrote:I guess the quicksort algorithm will be usefull when creating linked lists

Let me stop you there. No. Quicksort is horrible for lists. Mergesort is better suited for lists. However...

devc1 wrote:with an ID for each item, so you can use this algorithm to get the Item from the ID faster ?

What you have described there in a roundabout way is a map. You want a structure in which it is easy to look up an item by its ID. A sorted list will not help you here, as you still need to iterate over the preceding elements to find out where each successor is. A better solution would be a hash map, if you know the maximum size the map can ever be, or else you don't need to resize it too often (that is the slowest operation in a hash map). Otherwise, a tree map can be used (i.e. a self-balancing binary tree data structure, like a red-black tree, or an Anderson tree).

Another difference is that certain types of hash maps are very cache friendly, whereas all self-referential data structures risk a cache miss on every dereference. And cache misses these days are about on par with NVMe accesses.

These are all Computer Science fundamentals. I suggest you learn a lot more about these topics before moving on. Just search for "data structures", and you should be inundated with stuff to read. This is fairly important, because most work in a kernel depends on good data structures. Indeed choosing the right data structure for the job can be the most important optimization you can make. And that does mean you actually need to learn about complexity theory while you are studying data structures.

devc1 · Post by **devc1** » Thu Oct 20, 2022 5:45 am

What does the TLB have to do with this?

If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.

Why does the free function need to search the bitmap?

In order to allocate a heap entry structure in a linked list, instead of iterrating over each heap to check if its present (which accesses memory, which makes the function slower), I can just make a bitmap that represents each present item, and use "bsf" BitScanForward to scan for a free slot. It made my linked list implementation more than 20 times faster.

A list will be something like this :

Code: Select all

typedef struct {
UINT64 Bitmap;
Entry entries[64];
list* Next;
} list;

nexos · Post by **nexos** » Thu Oct 20, 2022 8:01 am

devc1 wrote:If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.

OK, but I still don't know what that has to with anything. When allocating a page for the bitmaps, the pages have to be aligned. There's no other option.

As for general optimization, making good use of the CPU's caches is very important in micro-optimization.

And no, I'm not talking about the registers, I'm talking about a block of SRAM inside the CPU that is much faster to access than the system's DRAM. The CPU tries to keep the most useful information inside the cache. But programs / OSes can help the CPU a lot with this.

devc1 · Post by **devc1** » Thu Oct 20, 2022 8:58 am

The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.

Will the prefetch instruction help with something ?

nexos · Post by **nexos** » Thu Oct 20, 2022 10:10 am

devc1 wrote:Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.

Exactly why you shouldn't be using a linked list.

devc1 wrote:The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.

What are you trying to say by this?

devc1 · Post by **devc1** » Thu Oct 20, 2022 8:42 pm

So what I use instead of a linked list ?

A binary tree ?

Solar · Post by **Solar** » Fri Oct 21, 2022 3:55 am

Just as a note, AmigaOS Exec went the exact other way -- "everything" was a linked list. Kernel objects were basically all derived from "list node". Tasks, devices, memory hunks. All handled by the identical list handling code, which the kernel exposed to users as well so they could use that for their own data. As opposed to more efficient data structures, this held the code handling those data structures in the CPU cache, because it was the same code for "everything".

iansjack · Post by **iansjack** » Fri Oct 21, 2022 3:01 pm

I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?

thewrongchristian · Post by **thewrongchristian** » Fri Oct 21, 2022 3:48 pm

iansjack wrote:I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?

68020 had an instruction cache. I think that was available with the Amiga 1200.

iansjack · Post by **iansjack** » Fri Oct 21, 2022 11:31 pm

The 68EC020 in the Amiga 1200 did have a 256 byte instruction cache that wasn’t present in the original 68000 used in earlier Amigas.

OSDev.org

Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization

Re: Optimization