Re: OSDev's dream CPU
Posted: Sat May 05, 2012 7:59 pm
Hi,
If you remove everything that isn't strictly necessary, what are you really going to be left with? I've used CPUs with only 3 general purpose registers, no caches, no floating point, no SIMD and no MMU, so obviously none of that is necessary.
Are predicates for every instruction necessary? Probably not, but does that prevent me from wanting it (even if it's only because I like the idea and 80x86 doesn't support it)?
I'm also not sure how aliasing SIMD and FPU registers would make sense (assuming SIMD can do scalar floating point anyway).
Maybe I could increase page size to reduce the size of other structures. How about 256 KiB pages, 256 KiB page tables, 256 KiB page directories and 256 KiB page directory pointer tables; with only 2 "CR3" registers?
Cheers,
Brendan
I did, which is proof that I can.Rudster816 wrote:You can't just come up with some clever oversimplification of a statement and think that adds to the discussion, can you?Brendan wrote:If you remove things just because they aren't strictly necessary, you're going to end up sitting naked on the floor with a sharp stick scratching text into your skin (because paper wasn't strictly necessary).
If you remove everything that isn't strictly necessary, what are you really going to be left with? I've used CPUs with only 3 general purpose registers, no caches, no floating point, no SIMD and no MMU, so obviously none of that is necessary.
Are predicates for every instruction necessary? Probably not, but does that prevent me from wanting it (even if it's only because I like the idea and 80x86 doesn't support it)?
Not sure where I've claimed that register renaming prevents stack spillages or is free...Rudster816 wrote:Register renaming doesn't prevent stack spillages, and doesn't come free.For CISC with register renaming 16 registers are probably overkill. If you're doing the load/store thing with simple instructions and a simple pipeline then you might want several hundred registers in the hope of failing less badly.
That wouldn't help for task switches. It would get in the way for a kernel API (where I like passing values in registers). It might help a little for IRQs, but that depends on the kernel's design.Rudster816 wrote:Context switches don't always mean that you have to save the state in memory. ARM has separate register banks for for Kernel\User modes, which are switched automatically at probably little to know extra cost (other than the cost of duplicating the registers).Brendan wrote:Most code (that can't be done with SIMD) is either integer only, mostly integer with a few registers for floating point, or mostly floating point with a few registers used for integers. The overhead of saving/restoring extra registers during context switches isn't worth it, so if you insist on separate registers for floating point then forget about floating point completely (and just use SIMD instead).
You think there's separate integer and FPU execution units, and the FPU execution unit isn't used for things like integer multiply/divide? How about conversions between integer and floating point? Don't forget SIMD is both integer and floating point too.Rudster816 wrote:When you overlap Integer\FPU registers you run in to significant microarchitectural challenges. With separate register files you don't need to connect the integer file to the FPU execution units. If they are the same register file, you do have to connect them, and when clock speeds for desktop chips topping out at 4ghz, floor planning is a significant concern. You'll want SIMD registers anyways, and aliasing all three types (Int\FP\SIMD) would be a waste (extra bits for SIMD to store Int\FP values), so why not just alias SIMD\FPU registers?
I'm also not sure how aliasing SIMD and FPU registers would make sense (assuming SIMD can do scalar floating point anyway).
You're right - to avoid making TLB miss more expensive than it should be, maybe we should have even larger pages, page tables and page directories, so that we can remove the whole page directory pointer table layer too. That'd reduce TLB miss costs by about 50%.Rudster816 wrote:Why make something cost more then it should?Brendan wrote:1.5 MiB was extremely high in 1982. In 1992 it was still very high. In 2002 is was just high. In 2012, I've got 12 GiB of RAM and couldn't care less. Will you be releasing the first version of my CPU before I have 64 GiB of RAM (and a hover-car)?
I really don't care about tablets/smart phones (they're just disposable toys to me), although most smartphones will have 1 GiB of RAM soon and 1.5 MiB would only be 0.15% of RAM anyway.Rudster816 wrote:Just because I have 12GB of RAM, doesn't mean I want to waste a bunch of it on paging structures. You're also cornering yourself into an environment where RAM is plentiful (PC's and Servers). Even just one step down (Tablets\Smart Phones), 1.5MB is looking much much bigger. It also means that each of those structures requires 512KB of contiguous physical memory, which might create some issues with OS's because of physical memory fragmentation.
Maybe I could increase page size to reduce the size of other structures. How about 256 KiB pages, 256 KiB page tables, 256 KiB page directories and 256 KiB page directory pointer tables; with only 2 "CR3" registers?
When Intel decides to increase virtual address space size they're going to have to redesign paging. I just hope they don't slap a "PLM5" on top of the existing mess to get 57-bit virtual address (and make TLB miss costs even worse).Rudster816 wrote:x64 virtual address's aren't capped at 48 bits at the architectural level, so your own argument for physical address's satisfies mine for non 64 bit virtual address's. Canonical form address's are a perfect solution to the future proofing problem.Brendan wrote:I didn't say that physical addresses would be 64-bit (I assumed an "implementation defined physical address size" like 80x86). It only costs 16-bits more per TLB entry than CPUs were using a decade ago.
The benefit is that I'll never need to make architectural changes when 48-bit (or whatever you think is today's lucky number) starts becoming a problem (e.g. probably within the next 10 years for large servers). Basically, it's future-proof.
There's no reason for a CPU that won't exist for 10 years to support something that won't be needed for 10 years? Somehow I suspect there's a flaw in your logic.Brendan wrote:There is no reason for current CPU's to support something that won't be needed for 10 years at an extremely high cost. VA's greater than 48 bits won't make sense on 99.99% of machines for quite some time, and the cost to up it is low. It's not it's 1980 and we have to decide if we want to store the year as two BCD's or use a more future proof scheme.
Cheers,
Brendan