Geri's platform

Geri · Post by **Geri** » Mon Jul 15, 2013 4:13 pm

hi

i designing a urisc platform. this is a freetime project, i dont want to spend a lot of time on it, and i dont want to create huge expectations about it, i just want to share this conception and idea,

and also, i will use this place to blog here where actually i am in the developing, so it will be very easy to laugh after i fail

conception:

*************************************************************************************
scalable 64-bit urisc platform to reach multiprocess operating system with minimal possible transistor count per core
*************************************************************************************

1. subleq derivation will be okay. i will use a 4 component subleq ,,instruction'' . i meditated it, and the original 3 component version will not be okay for a multithreaded - smp operating system. i will use the last parameter to point on eip (yeah, i meditated it for a day, and it turned out that this way a tasking multithread/smp kernel can be implemented) - in a very twisted way, but yeah, it will work.

2. no relative addressing. it turned out that static addressing will be okay. also, NO delta for the software running.

3. no registers at all.

4. io will be implemented over negative memory addresses.

5. smp

(the concept is designed: 100%)

**********
c99 compiler:
**********
becouse nobody wants to write braindead assembly code. i investigated if there is alreday an usable compiler for subleq. it turned out that there isnt (no, oleg's compiler arent one (no true memory handling (no real data types), integer only, overcomplicated compiler) ).
so i started to write a new c99 compiler from scracth. i am around 5% of the process in this. it will support 64, 32, 16, and 8 bit integers (truely using this datas when loading-storing into ram). and 64-bit fixed point (runtime will have float_to_fixedpoint conversion and vice versa. float and double will both be defined as fixed64.) long and int is 64 bit.

*****
kernel:
*****
a small hybrid kernel with smp and multithread compatibility, with compatibility of writing to an own, or a very simply file system that can be handled from very few clock cycles (if somebody knows one, tell me)

**************
operating system:
**************
very simply gui with some simple programs, like calculator, paint
and hopefully self-hosting due to the c compiler

apis:
-fopen & friends
-tcp (networking)
-opengl
-interaction with the os to display windows, etc

******
the cpu:
******
easy to emulate, easy to design (i have a cpu designed friend who wants to implement it on an fpga)
(we previously wanted a risc, but that was still too overcomplicated and slow. actually, subleq is almost fast as a risc architecture!)

******************************
benefits from the architecture:
******************************

-easy sacling from home-made self-soldered transistor-cpu (subleq can implemented from transistors easily by a human at his home with his hands)
-easy to implement it on an fpga or vhdl and/or in other circuit designer automathical tools
-easy to add 8-16 cores even with minimal designing
-minimal power consumption
-implementations even can run *IN THEORY* on n*100 ghz due to the extreme low number of transistors (assuming that its not manufactured with cmos manufacturing processes). OR implementations can contain tens of thousands monolithic cpu cores (sadly after a few 1000 transistors, its necessary to make it with cmos where is a drawback for a few ghz)
-easy to emulate
-no tlb, no virtualmemory, no legacy things
-easy io, disk polling, no hardware acceleration needed for interrupts, syscalls, etc. io will work on the same way everywhere, this means that the same os can be used everywhere, unlike one the ultimately crap arm where every system needs different loader and operating system

******************************
plans with this architecture:
******************************

-i am sick from the idiot x86, that wastes the 99,995% of its power consumption, and the retard gpu-s, which cannot be programmed normally, becouse the ,,worlds largest semiconductor companies'' cant even realize that assymethric systems are impasse. so this can scale from the most simplyest architectures from the supercomputers with easy C programming without requiring any special programming dialects.

-if it will be succesfull and popular (chance: 0.01%), maybe the hardware and the software can be sold, and i will be rich like bill gates (0.000001%)

-subleq is like a giant hardware obfuscator: programs written on this are still hard to crack

checkpoints:
-design the architecture in brain: done
-make the c compiler to compile very simpy test programs: under work
-implement the io to at least display debug strings
-make the c compiler more mature
-implement disk, timer, the fixedfunct interrupt
-make a boot loader that is able to load the kernel from the disk
-make the c compiler even more mature
-implement file system in kernel
-implement keyboard
-implement fixed point
-implement graphics and text output
-implement loadable shared object support
-implement threading
-implement smp in the virtual machine
-implement opengl/opengles
-finalize the c compiler and be sure that its finally self-hosting
-implement networking
-implement some tools, like file browser, minimal web browser, ftp, paint, notepad
-get everyone to use this, and become millionarie

NickJohnson · Post by **NickJohnson** » Mon Jul 15, 2013 10:04 pm

Geri wrote:1. subleq derivation will be okay. i will use a 4 component subleq ,,instruction'' . i meditated it, and the original 3 component version will not be okay for a multithreaded - smp operating system. i will use the last parameter to point on eip (yeah, i meditated it for a day, and it turned out that this way a tasking multithread/smp kernel can be implemented) - in a very twisted way, but yeah, it will work.

2. no relative addressing. it turned out that static addressing will be okay. also, NO delta for the software running.

3. no registers at all.

It sounds like you're trying to build the "simplest possible architecture" by just constructing the simplest Turing-complete architecture you can. For basically any purpose other than having fewer transistors this is a bad idea. A non-pipelined microprocessor without a cache or anything fancy would still be extremely small and far more effective, by basically any metric: time to execute programs, total power consumption per program, speed of program execution per transistor on die, memory usage, etc. You'd have better luck programming a single shader on your GPU.

You also say you would be able to clock the chip at hundreds of GHz since it is so small. That might make sense (I'm sure there are physics-y problems with doing it too, but I'm not that sort of guy), except that the processor needs an instruction stream from somewhere and some way to access memory. Even with an SRAM-based cache or memory on chip, there is no way you could do instruction fetches and memory operations at hundreds of GHz. There is a reason that modern superscalar processors dedicate most of their transistors to cache.

dozniak · Post by **dozniak** » Mon Jul 15, 2013 10:09 pm

Geri wrote:i have a cpu designed friend

This sounds scary. With cpus being able to design humans we are entering a really new era here. May be this is something you should focus on instead.

Geri · Post by **Geri** » Tue Jul 16, 2013 6:36 am

NickJohnson wrote: It sounds like you're trying to build the "simplest possible architecture" by just constructing the simplest Turing-complete architecture you can.

correct.

NickJohnson wrote: For basically any purpose other than having fewer transistors this is a bad idea. A non-pipelined microprocessor without a cache or anything fancy would still be extremely small and far more effective, by basically any metric: time to execute programs, total power consumption per program, speed of program execution per transistor on die, memory usage, etc. You'd have better luck programming a single shader on your GPU.

-it can have cache, if somebody wants to add. but its not necessary. it costs a few x*1000 transistors (according the cpu designer friend. he wants to add cache too.).

-it can be designed to be superscalar. if somebody wants to make it to be superscalar, thats additionally x*1000 transistor (this pushes the architecture up to 3 pipeline, according to my fast calculation)

-however, if you add these features, its still much smaller than arm or x86, and its still eats *much* few watts.

-the time to execute the program is not so small. subleq is suprisingly fast when compared to other risc architectures.

-its far better than gpu, risc, x86, when we calculate the total performance/watt of the architecture. also scales from the most simply architectures to supercomputers.

NickJohnson wrote: You also say you would be able to clock the chip at hundreds of GHz since it is so small. That might make sense (I'm sure there are physics-y problems with doing it too, but I'm not that sort of guy), except that the processor needs an instruction stream from somewhere and some way to access memory. Even with an SRAM-based cache or memory on chip, there is no way you could do instruction fetches and memory operations at hundreds of GHz. There is a reason that modern superscalar processors dedicate most of their transistors to cache.

thats right. however, this is just a theory. in fact, its alreday able to run single (and few) transistors over 600 ghz, but you need to cache the insutrctions too. if you can come out below 2-5000 transistor (including a minimum l1 cache), that is maybe can processed over 600 ghz *in theory*. however, there is nobody yet designed such a large clocked cpu, and we neither will, but if once somebody will try to make one, well, he simply can use my architecture to do it, and he will not need to make special toolchains, he will see it immediately if it boots or not. also a transistor over 600 ghz is 1/million meter large, so you still can build multiple cores like this. the power consumption and heating is not know yet.

Geri · Post by **Geri** » Tue Jul 16, 2013 6:39 am

dozniak wrote:
Geri wrote:i have a cpu designed friend
This sounds scary. With cpus being able to design humans we are entering a really new era here. May be this is something you should focus on instead.

designer .P
yes, this is what i want to focus, to build a platform wich is easilly can be manufactured, and then they easilly can use my system on it. with this, you can basically build a handheld device from ~20 miliwatt of power consumption (+ the screen and the ram), so its very effective.

bluemoon · Post by **bluemoon** » Tue Jul 16, 2013 7:12 am

Geri wrote:-its far better than gpu, risc, x86, when we calculate the total performance/watt of the architecture. also scales from the most simply architectures to supercomputers.

You ignored the fact that the instruction count of a normal application is hundred time more on OISC than x86 or ARM, since you need to emulate other instruction by pipe of subleq, also as noted about, dependency stalls and memory bandwidth will be the bottleneck - this is part of the reason the world moved toward SIMD and large cache - to do many things at once instead of inducing huge execution dependency.

Furthermore, you seems to misunderstood that a higher GHz or instruction execution speed(MIPS) means performance; No, with modern CPU the bottleneck is not instruction execution, but how to eliminate stalls(OOO execution) and bus waiting(hyper-threading)

Anyway, OISC is good topic for hobby developer, but far from comparable to commercial CPUs in all aspect.

Geri · Post by **Geri** » Tue Jul 16, 2013 7:22 am

its only 2-20 times slower (in generic algo's, its about 3x-6x), so its not a big deal. it will turn out how effective is it. actually, since i can have tens of tousands of cores in theory, i am not concerning about that.

bluemoon · Post by **bluemoon** » Tue Jul 16, 2013 7:55 am

I respect your design choice and agree it's good hobby project.

But my advice is not to make such claims just because "in theory it should be", and not mention which theory, nor provide any fact or research stats to backup such claims.

Geri wrote:i can have tens of tousands of cores in theory

For tens of thousand of cores, distributed overhead become very significant, and the application domain is "many parallel tasks" like what GPU does, which is different from general computing, like what CPU does.

Geri · Post by **Geri** » Tue Jul 16, 2013 7:57 am

bluemoon wrote:I respect your design choice and agree it's good hobby project.
But my advice is not to make such claims just because "in theory it should be", and not mention which theory, nor provide any fact or research stats to backup such claims.

i totally aggre, this is why i tell ,,in theory'' in such cases.
after i will able to start the first complex code compiled with the compiler in the virtual machine, i will know much more.

NickJohnson · Post by **NickJohnson** » Tue Jul 16, 2013 9:38 am

I think you also may be underestimating how large your instructions will have to be. You say you don't have any registers, but by having no registers and accessing memory directly without relative addressing, you've basically just made all of memory into registers. That means every instruction is going to need three full memory addresses, which is going to take up a lot of wires and a lot of transistors.

Microprocessors don't get more efficient by making them more theoretically pure; they get more efficient by being able to accommodate the usage patterns of normal programs while cutting as many corners as possible (e.g. taking advantage of spatial and temporal locality by using relative addressing and caches). You make common things more efficient by making uncommon things less efficient. This is the common law in all of systems programming and information theory.

Geri · Post by **Geri** » Tue Jul 16, 2013 9:44 am

we alreday calculated the transistor demand, its more efficient than our previous architecture (that we wasnt even able to implement, becouse it used too many transistor even if it used a very very tiny risc instruction set)

zeitue · Post by **zeitue** » Tue Jul 23, 2013 7:59 pm

Your architecture sounds quite interesting
a few questions about this

You claim your CPU is easy to emulate, what method do you plan to use to emulate it(dynamic translation, visualization, JIT, or some kind of a mix of methods)?
Would this be usable in a bytecode virtual machine as the CPU of a process virtual machine?
How does this architecture preform as far as speed and power in comparison to the X86, Arm, Power PC, ....?

Geri · Post by **Geri** » Thu Aug 01, 2013 2:38 pm

zeitue wrote:You claim your CPU is easy to emulate, what method do you plan to use to emulate it(dynamic translation, visualization, JIT, or some kind of a mix of methods)?

since subleq does not have multiple opcodes, there is no need for dynamic translation, or jit, since the code can be executed directly on the cpu/gpu using directly the cpu's/gpu's instruction set. maybe 2-5 clock per op is possible. i am not yet sure, since i not yet have tried it. basically, executing subleq instruction is:

mem-=mem[a];
if(mem<=0) eip=c;

(there is no need to dynamically compile x86 code from this, since its alreday native)

zeitue wrote:Would this be usable in a bytecode virtual machine as the CPU of a process virtual machine?

this will be usable in any kind of virtual machine, emulator, including a full system emulator, or an user-mode-only bytecode that runs in -for example, in a game software, to process your strips.

also, the compiler will be able to create non-os binary that does not requires an operating system under it. the operating system's kernel will be compiled with this, and its able to compile itself too, so it will be hopefully self-hosting.

i will release the specifications of the platform and the virtual machine that requires to run it.

zeitue wrote:How does this architecture preform as far as speed and power in comparison to the X86, Arm, Power PC, ....?

-since this architecture have large ,,instructions'', and requires a lot of operation sometimes, its approx 40-95% slower than a modern x86, arm, mips, or powerpc core, on the same clock speed. (depends on the algo). however, the alu and fpu of a modern 64 bit x86 or arm system requires around 100-200 million transistor per core, alu of this architecture requires only a few hundred of transistor.

-when counting 1 billion transistor for cache, we have room for ~5000 smp capable core with the current manufacturing technologies.

-when emulating, its much faster than emulating arm, x86 or mips.

-when we target extreme low power consumption, such like mobiles, tetrises, any kind of extreme-low-end chinese cpu manufacturer can build it for the same price like they current fixedfunction processors, on the same speed. and also, its easy to implement it in hardware

Combuster · Post by **Combuster** » Thu Aug 01, 2013 4:17 pm

Geri wrote:mem-=mem[a];
if(mem<=0) eip=c;

In that formulation, it doesn't even look turing-complete. You can't access memory unit x if the number isn't hardcoded into the machine.

dozniak · Post by **dozniak** » Thu Aug 01, 2013 5:00 pm

Combuster wrote:
Geri wrote:mem-=mem[a];
if(mem<=0) eip=c;
In that formulation, it doesn't even look turing-complete. You can't access memory unit x if the number isn't hardcoded into the machine.

It is.

OSDev.org

Geri's platform

Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform

Re: Geri's platform