How would you design a CPU, from a software point of view?

Brendan · Post by **Brendan** » Sat Nov 17, 2007 4:33 pm

Hi,

@Axilmar: Some notes:

- I'm curious why special purpose registers and addresses are only 32-bit
- there's no need for a "greater than" flag if you've got "less than" and "equals" flags (if something is not less than and not equal, then it's greater).
- I'm not sure how you'd implement something like "copy on write" with your memory management (no "read only" flag in the page itself)
- to me, it looks like the segmentation is going to be an extreme performance problem - for every memory access the CPU will need to search through the Segment Table and then search through the Access Table. Even if this is cached in the CPU it'll still be slow (especially if the number of entries in the table/s is more than the number of entries in the special cache).
- I don't know why you're using 2 seperate tables (Access Table and Segment Table) instead of having one table with all the information.
- you don't specify what happens if different segments overlap (e.g. one segment that says an area is "read only" and another segment that says the same area is "write only").
- Atomic Execution is going to have severe contention/scalability problems and fairness problems in many-CPU systems. For e.g. imagine one CPU continually doing atomic loads that prevents any other CPU from progressing.
- the interrupt handling won't work in some cases. For example, if doing "call foo" causes a page fault (not-present page at SR) then the CPU will go into a "continuous page fault loop" (or triple fault?). In this case, if any unpriveleged code can change SR (or continually push return addresses onto the return stack) then any unpriveleged code would be able cause the OS to lock up completely (or crash).
- I couldn't find any instructions for zero-extended loads (e.g "movzx eax,byte [foo]"), even though you've got lots of instructions for sign-extended loads (e.g. LDVB, LDB, LDXB, LDRB, etc). I'm not sure if this is deliberate or not (a sign-extended load could be followed by an additional AND instruction to make it work like a zero-extended load). See the note.
- you're missing MOD instructions for signed and unsigned integers, and it's more efficient for DIV and DIVI to return both the quotient and remainder (in some cases you can do the division itself once rather than doing DIV and MOD, especially if you're trying to divide large integers - e.g. dividing a 256-bit integer by a 64-bit integer). See the note.
- there's no "ADC" or "SBB" instructions, which makes operating on 128-bit or larger integers slow (e.g. you can't do "ADD R0, R2; ADC R1, R3"). Doing it manually (e.g. with "JMPC C") is messy and causes branch misprediction problems. See the note.
- there's no way for software to identify which version of the CPU they're running on and no way for software to determine if features you add in future are present
- there's no details on which conditions cause which exceptions (e.g. debug exception, security exception, etc) and I'm not sure which instructions are privileged instructions (for e.g. can unpriveleged code use the "MOVSR" instruction?)
- there's no details for the Interrupt Table (entry format, etc)
- there's no cache management instructions (e.g. CLFLUSH), no mention of a TLB for paging data structures, and no details of whether or not the CPU maintains consistancy between RAM and the TLB (if any) and special cache (used for the Access Table and Segment Table) or not. I'm guessing the CPU shouldn't maintain consistancy for the TLB and special cache to reduce the number of (frequently unecessary) checks that the CPU needs to do (and improve performance) but this would require instructions for manually invalidating these caches when paging and/or the Segment Table and/or Access Table are changed.

Note: RISC sucks because you need many instructions to do the same work as one CISC instruction, which makes performance worse (more instruction fetches, worse trace cache efficiency, more pipeline flushes, etc).

Cheers,

Brendan

earlz · Post by **earlz** » Sat Nov 17, 2007 11:19 pm

I am too tired to think up everything right now(I have given quite a bit of thought to this in the past though)

but one thing that'd be extra nifty...
The cpu instructions being micro-code, and the OS able to make different micro-code instructions for each code segment...probably be too expensive to do, but anyway...

I just think it'd be nifty to have a syscall instruction implemented by the OS...
also, it makes an emulators job a lot easier...of course, a base instruction set(only a few made of micro-code) would be needed as something read-only

and instructions to change the endian and byte order...

hmmm...also, TONS of registers...like 512 would be a nice even number...but to keep instruction sizes small, use something like (iirc it's called) banking so that you can only access 8 or 16 registers at a time...or like double banks so you could access r0-r7, and r247-255 at the same time(but in the intstruction being referred to as 0-7 and 8-16)

a directly editable eIP would be nifty also...maybe have it be on of those 512 registers or something so that conditional movs and such would be better, and you could very easily do relative stuff

possibly register "hooks" or something so that when you write to a register it writes to a spot in memory(which you can select, or you can have it so it doesn't for speed) and then you could have hooks on them by using paging... could also make it so 512 registers don't have to be saved on task switches if the last 256 or something like that is actually just memory

a ring1 mode that drivers can use effectively, allowing them to do most everything, but so that somehow on a double fault, things are restored to a "safe" state(?) (I don't know how to explain it...)

yea...I'm sure that makes less sense than I think, so I'm going to bed...