Optimization
Re: Optimization
I am using linked lists for heaps and threads.
-
- Member
- Posts: 5563
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Optimization
Why do you need to search for allocated items in your heap?
Re: Optimization
You need to maintain a list of heap entries so that the free function will only free a particular area of memory. And the allocate function will go search free memory in these heap entries ?
-
- Member
- Posts: 5563
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Optimization
Why does the free function need to search the bitmap?
Re: Optimization
What does the TLB have to do with this?devc1 wrote:I may create multiple bitmaps to fit the list in a page where I can benefit from the tlb cache, and I make sure that the list is page aligned.
Re: Optimization
Let me stop you there. No. Quicksort is horrible for lists. Mergesort is better suited for lists. However...devc1 wrote:I guess the quicksort algorithm will be usefull when creating linked lists
What you have described there in a roundabout way is a map. You want a structure in which it is easy to look up an item by its ID. A sorted list will not help you here, as you still need to iterate over the preceding elements to find out where each successor is. A better solution would be a hash map, if you know the maximum size the map can ever be, or else you don't need to resize it too often (that is the slowest operation in a hash map). Otherwise, a tree map can be used (i.e. a self-balancing binary tree data structure, like a red-black tree, or an Anderson tree).devc1 wrote:with an ID for each item, so you can use this algorithm to get the Item from the ID faster ?
Another difference is that certain types of hash maps are very cache friendly, whereas all self-referential data structures risk a cache miss on every dereference. And cache misses these days are about on par with NVMe accesses.
These are all Computer Science fundamentals. I suggest you learn a lot more about these topics before moving on. Just search for "data structures", and you should be inundated with stuff to read. This is fairly important, because most work in a kernel depends on good data structures. Indeed choosing the right data structure for the job can be the most important optimization you can make. And that does mean you actually need to learn about complexity theory while you are studying data structures.
Carpe diem!
Re: Optimization
If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.What does the TLB have to do with this?
In order to allocate a heap entry structure in a linked list, instead of iterrating over each heap to check if its present (which accesses memory, which makes the function slower), I can just make a bitmap that represents each present item, and use "bsf" BitScanForward to scan for a free slot. It made my linked list implementation more than 20 times faster.Why does the free function need to search the bitmap?
A list will be something like this :
Code: Select all
typedef struct {
UINT64 Bitmap;
Entry entries[64];
list* Next;
} list;
Re: Optimization
OK, but I still don't know what that has to with anything. When allocating a page for the bitmaps, the pages have to be aligned. There's no other option.devc1 wrote:If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.
As for general optimization, making good use of the CPU's caches is very important in micro-optimization.
And no, I'm not talking about the registers, I'm talking about a block of SRAM inside the CPU that is much faster to access than the system's DRAM. The CPU tries to keep the most useful information inside the cache. But programs / OSes can help the CPU a lot with this.
Re: Optimization
The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.
Will the prefetch instruction help with something ?
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.
Will the prefetch instruction help with something ?
Re: Optimization
Exactly why you shouldn't be using a linked list.devc1 wrote:Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.
What are you trying to say by this?devc1 wrote:The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
Re: Optimization
So what I use instead of a linked list ?
A binary tree ?
A binary tree ?
Re: Optimization
Just as a note, AmigaOS Exec went the exact other way -- "everything" was a linked list. Kernel objects were basically all derived from "list node". Tasks, devices, memory hunks. All handled by the identical list handling code, which the kernel exposed to users as well so they could use that for their own data. As opposed to more efficient data structures, this held the code handling those data structures in the CPU cache, because it was the same code for "everything".
Every good solution is obvious once you've found it.
Re: Optimization
I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?
-
- Member
- Posts: 426
- Joined: Tue Apr 03, 2018 2:44 am
Re: Optimization
68020 had an instruction cache. I think that was available with the Amiga 1200.iansjack wrote:I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?
Re: Optimization
The 68EC020 in the Amiga 1200 did have a 256 byte instruction cache that wasn’t present in the original 68000 used in earlier Amigas.