Re: True cross-platform development
Posted: Thu Apr 13, 2017 4:18 am
Hi,
It also means that every instruction needs three addresses (address of each operand plus address to jump to), which makes the instructions huge (e.g. over 20 bytes for 64-bit) and cripples performance due to instruction fetch consuming too much bandwidth from cache (that's already being pounded to oblivion by 4 memory accesses).
That lack of things like "indexed addressing modes" means that you have rely on self modifying code for extremely basic things (e.g. "x = array[y];"). This will cripple performance because you don't get much benefit from splitting cache into "L1 instruction cache" and "L1 data cache". It will also cripple performance by ruining speculative execution. It will also cripple performance by making higher-level things (e.g. memory mapped executables, multiple CPUs running the same code, etc) impossible or impractical.
"Lack of registers" cripples performance by making various tricks (register renaming) impractical. Combined with self modifying code, it also means that the CPU can't effectively determine what an instruction depends on, which cripples performance by ruining "out-of-order execution".
"Every instruction is a potential branch" means that the CPU has to update the instruction pointer and compare that against the branch target just to determine if it's a real branch or not; then (if it is a real branch) there's no way to do static branch prediction effectively (e.g. you can't have simple rules, like "backward branches predicted taken, forward branches predicted not taken"). This means a heavy reliance on branch target buffers for branch prediction, and will cripple performance due to branch mispredictions ruining instruction pre-fetch. A practical CPU will spend almost all of its time stalled (waiting for data from RAM or cache) and doing nothing.
On top of all of this; there's "one instruction". This means that things that are extremely trivial in digital electronics (e.g. bitwise operations like AND, OR and XOR, shifting, etc) become many slow instructions, and more complex things (multiplication, division, modulo) become a massive disaster; which all cripples performance. It also means that there's no SIMD, which cripples performance for anything that involves processing a lot of data (e.g. "pixel pounding").
Finally; there's "only one data type" (e.g. no support for anything that isn't a 64-bit integer). For smaller (8-bit, 16-bit, 32-bit) integers programmers will mostly just use 64-bit integers where they can, which will cripple performance by wasting a huge amount of RAM and exacerbating the already severe "cache bandwidth limitations" problem. When programmers can't do this (file formats, network packets, etc) they'll have to resort to "shifting and masking", where both shifting and masking are extremely slow, which will cripple performance. For floating point; welcome to a realm of nightmares, please kiss any hope of acceptable "FLOPS" goodbye.
For all of these reasons; if someone like Intel spent billions of dollars trying to make the fastest SUBLEQ CPU possible, they wouldn't even be able to beat an 80486 (from almost 3 decades ago) with ten times the power consumption (and a hundred times the price).
All of the above should've been obvious for anyone actually interested in CPU design. None of this was obvious to Geri, but that's fine (being clueless is perfectly natural). However, people who aren't clueless pointed out almost all of the problems 4 years ago and Geri was too stupid to listen to anyone, and since then Geri would've had to have discovered half the massive performance problems first hand many many times over the last ~4 years (including "Oh, alpha blending is far too slow because it involves multiplication" recently) and now Geri is ignoring evidence that he created himself.
Mostly; multiple severe and unsolvable performance disasters are a massive problem; but they are nothing compared to Geri's progression from "clueless" to "ignorant" to "delusional". When you add wreckless incompetence as a software developer on top of that, you get a recipe for decades of pure pointless failure.
Cheers,
Brendan
At the lowest level I'd count it as 4 memory accesses (instruction fetch, reading 2 values, then writing the result of the subtraction), which will cripple performance due to cache bandwidth limitations. At a slightly higher level (e.g. MMU) I'd count it as 3 memory accesses, which will cripple performance if any kind of "secure multi-process" is attempted.dozniak wrote:This means _each_ OISC instruction performs _two_ memory accesses _each_ time it is run. This is _extremely_ slow and probably beats all other performance considerations.SpyderTL wrote:And since the x86 and ARM processors, AFAIK, can't copy from one immediate memory address to another immediate memory address in a single instruction, and since that's pretty much the only thing an OICS processor can do, there are a few scenarios where an OISC processor may even be faster per clock cycle.
It also means that every instruction needs three addresses (address of each operand plus address to jump to), which makes the instructions huge (e.g. over 20 bytes for 64-bit) and cripples performance due to instruction fetch consuming too much bandwidth from cache (that's already being pounded to oblivion by 4 memory accesses).
That lack of things like "indexed addressing modes" means that you have rely on self modifying code for extremely basic things (e.g. "x = array[y];"). This will cripple performance because you don't get much benefit from splitting cache into "L1 instruction cache" and "L1 data cache". It will also cripple performance by ruining speculative execution. It will also cripple performance by making higher-level things (e.g. memory mapped executables, multiple CPUs running the same code, etc) impossible or impractical.
"Lack of registers" cripples performance by making various tricks (register renaming) impractical. Combined with self modifying code, it also means that the CPU can't effectively determine what an instruction depends on, which cripples performance by ruining "out-of-order execution".
"Every instruction is a potential branch" means that the CPU has to update the instruction pointer and compare that against the branch target just to determine if it's a real branch or not; then (if it is a real branch) there's no way to do static branch prediction effectively (e.g. you can't have simple rules, like "backward branches predicted taken, forward branches predicted not taken"). This means a heavy reliance on branch target buffers for branch prediction, and will cripple performance due to branch mispredictions ruining instruction pre-fetch. A practical CPU will spend almost all of its time stalled (waiting for data from RAM or cache) and doing nothing.
On top of all of this; there's "one instruction". This means that things that are extremely trivial in digital electronics (e.g. bitwise operations like AND, OR and XOR, shifting, etc) become many slow instructions, and more complex things (multiplication, division, modulo) become a massive disaster; which all cripples performance. It also means that there's no SIMD, which cripples performance for anything that involves processing a lot of data (e.g. "pixel pounding").
Finally; there's "only one data type" (e.g. no support for anything that isn't a 64-bit integer). For smaller (8-bit, 16-bit, 32-bit) integers programmers will mostly just use 64-bit integers where they can, which will cripple performance by wasting a huge amount of RAM and exacerbating the already severe "cache bandwidth limitations" problem. When programmers can't do this (file formats, network packets, etc) they'll have to resort to "shifting and masking", where both shifting and masking are extremely slow, which will cripple performance. For floating point; welcome to a realm of nightmares, please kiss any hope of acceptable "FLOPS" goodbye.
For all of these reasons; if someone like Intel spent billions of dollars trying to make the fastest SUBLEQ CPU possible, they wouldn't even be able to beat an 80486 (from almost 3 decades ago) with ten times the power consumption (and a hundred times the price).
All of the above should've been obvious for anyone actually interested in CPU design. None of this was obvious to Geri, but that's fine (being clueless is perfectly natural). However, people who aren't clueless pointed out almost all of the problems 4 years ago and Geri was too stupid to listen to anyone, and since then Geri would've had to have discovered half the massive performance problems first hand many many times over the last ~4 years (including "Oh, alpha blending is far too slow because it involves multiplication" recently) and now Geri is ignoring evidence that he created himself.
Mostly; multiple severe and unsolvable performance disasters are a massive problem; but they are nothing compared to Geri's progression from "clueless" to "ignorant" to "delusional". When you add wreckless incompetence as a software developer on top of that, you get a recipe for decades of pure pointless failure.
Cheers,
Brendan