Architecture for video-game system

AndrewAPrice · Post by **AndrewAPrice** » Thu May 22, 2008 10:06 pm

I'm building a video game system, comparable with quality/performance of the early 3DO/mid-SNES era with a few modernisations (IP communication, optional HDTV support)

It won't actually be a physical system (if anyone wants to develop one then that would be great), instead I will be an emulated console that imitates as if it were a real console.

I'm going to design the CPU, graphics chip, audio chip etc and treat it as if it was real hardware.

It will get a compiler/assembler working to target the console. Which would be an easier task if I chose a well known architecture since there would already be a working tool chain.

So I was wondering what the best architecture to go with (ease of emulation vs flexibility). I'm looking at ARM9E (it should be well documented with the number Nintendo DS emulation projects).

The graphics, audio, network, input chip specifications I'll design myself after I have a working architect.

EDIT: I could go with x86 since I might be able to achieve an optimised emulator by some of the code directly on the CPU.

AndrewAPrice · Post by **AndrewAPrice** » Thu May 22, 2008 11:59 pm

I've decided to go with my own CPU. I'm not sure what the memory addressing should me (I'll probably go with 32 bits) or how much memory. The CPU will have one large 1 kilobit register which is subdivided into 2x 256 bit down to 64x 8-bit registers. There's not much you can do with a 1kb register besides move it in/out of memory (might be useful for transferring large blocks of data).

Does anybody have advice for designing the CPU? I'm not going to put in paging or anything, or I might, depending on if I work out a better method for a purely game system. I still need to decide if to support position-independent code (probably not).

JamesM · Post by **JamesM** » Fri May 23, 2008 1:26 am

MessiahAndrw wrote:I've decided to go with my own CPU. I'm not sure what the memory addressing should me (I'll probably go with 32 bits) or how much memory. The CPU will have one large 1 kilobit register which is subdivided into 2x 256 bit down to 64x 8-bit registers. There's not much you can do with a 1kb register besides move it in/out of memory (might be useful for transferring large blocks of data).

Does anybody have advice for designing the CPU? I'm not going to put in paging or anything, or I might, depending on if I work out a better method for a purely game system. I still need to decide if to support position-independent code (probably not).

Hi,

I'm not sure exactly what you're asking. Do you want to design a CPU or a processor architecture?

The latter is not particularly difficult (to design and write an emulator for), but if you wish to do the former, that is, come up with a CPU design on paper that could be transformed into physical hardware, it's a very big deal.

There are all sorts of things to consider, including but not limited to: pipelining, out of order execution, branch prediction, caching, MMU facilities (you'd also have to design the TLB lookup mechanism), I/O (must bypass cache)...

If the latter, you're restricted to a subset, including but not limited to: RISC vs. CISC, fixed instruction size vs. variable, register set, ABI...

If you're designing an arch purely for emulation, then as mentioned in another thread in which I happened to be a participant by Schol-R-LEA, the size of the register file makes no difference whatsoever, as subject register accesses would transform into accesses into the target's abstract register bank (ARB), which is held in RAM.

Personally, given the latter option, I'd go with a RISC system, addressing 16 bits of memory (do you need more? I don't think the SNES had more) with a fixed instruction size of 32 bits. Possibly 32-bit addressing instead - depends if you need it. It adds a bit of complication.

I wouldn't go for an MMU, as, as Dex will readily point out, most (older) games consoles don't have an MMU anyway and are single tasking.

PIC (or more precisely PC-relative branching) shouldn't be an issue at all for emulation, as you don't emulate pipeline effects.

ABI-wise, you have to decide how a stack invocation is going to look like. Are parameters passed on the stack? are they passed in registers? I'd go with stack, because as mentioned previously in an emulated environment register stores necessarily write-through to memory, and storing parameters on the stack makes getting backtraces easier. (Think x86 vs. ARM).

I'm sure there are more things that could be pointed out by other people - designing a CPU is not a job for the light hearted, or if I may be so bold, not a job for people who haven't taken a hardware/CPU course! (However, designing an architecture alone isn't so bad).

Cheers,

James

AndrewAPrice · Post by **AndrewAPrice** » Fri May 23, 2008 1:31 am

I'm not interested in actually building it physically, just creating a CPU that I can emulate.

So far it's going to be little-endian, and it's going to feature a stack (which will be a pointer to memory that increases/decreases as you push/pop the stack). I'm designed in basic instruction set now.

My CPU has these basic instructions so far:

- copy value from memory to register
- copy value from register to register
- copy value from register to memory

Each instruction has a 8, 16, 32, 64, 128, 256, 512, and 1024-bit variation, although all memory addresses are 32-bit.

Instructions I still have to write are:
- add another register to a register
- subtract another register from a register
- multiply a register by another register
- divide a register by another register (remainder is stored in the register it divided it against)

Each one will need 8, 16, 32, 64, 128, 256, 512, 1024-bit equivalents, including signed/unsigned (for multiply and divide) and special floating point variations

Other instructions I need will include:
- clear register to 0
- bit shift a register left by a value
- bit shift a register right by a value
- perform a bitwise and
- perform a bitwise not
- perform a bitwise or
- perform a bitwise xor
- convert up/down between bit-sizes
- convert float to integer
- convert integer to float

Then I have to think about conditions, jumping, memory protection, and IO.

Ahh.. it's a lot of work

But kind of fun as well

JackScott · Post by **JackScott** » Fri May 23, 2008 2:23 am

If I ever find myself fuming at how bad x86 is, I just go and design myself a processor architecture. Then I realise that they didn't actually do such a terrible job.

That said, it gets a lot easier when you don't have rings, paging, forced segmentation, or any of the other things that make the x86 architecture such a terrible thing to work with.

For instance, the 6502 architecture is blindingly simple. As far as designing a SNES-like game machine is concerned, that would be something to consider... taking that processor. In fact, it was used in the NES and the Atari 2600. An upgraded 16-bit version (the 65C816 from memory) was used in the SNES. All three consoles had custom-designed video and sound hardware (though the 2600 was basically bit-bashing a DAC).

JamesM · Post by **JamesM** » Fri May 23, 2008 2:44 am

Or, use the z80 system, for which there are already emulators.

How is your instruction set encoded? You must be using variable length instructions to be able to encode such huge numbers, right?

Another thing to remember is signed/unsigned shifts. Signed shifts sign extend, unsigned ones do not. (Also referred to as arithmetic/logical shifts, hence the common instructions "sra" and "srl" (shift right arithmetic and shift right logical).

I also notice you're using floating point. All I can say here is... arrrghhhhh!!. That is all.

Cheers,

James

Combuster · Post by **Combuster** » Fri May 23, 2008 3:33 am

There's nothing wrong really with a 1k register file. Actually, it could at points get you a significant boost in GFlops.

What you might want to consider as a nice addition to that is if you want to divide that wide-register into vectorized parts, and add each separately. (instead of adding one 256 bit int, add 16 shorts in one instruction)
As for floats, handling those in variable sizes is just plain ugly. I'd rather go with either 32 or 64 bit floats and use said vectorisation method.

but on average I get the idea that the resulting CPU is more suited to do GPU and DSP work rather than standard game logic (where you won't need that huge register size).

also, performing some operations on 1024-bit registers seem pointless as you don't have enough space left for a second register to add or otherwise work with...
and do you keep a special register for the stack?

Then you have the five shifts: shl, shr, asr, rol, ror (shift, arithmetic shift, rotate)

not to forget: interrupt handling - how do you plan to do that?

About encoding:
1024 bits = 2^10. You can use 3 bits to give any operation width between 4 and 1024. Another 3 bits for the vector width, and 8 bits that determines the offset (in 4-bit granularity)
so for two-operand instructions you get 2x8 + 2x3 = 22 bits. Pack that into 32-byte opcodes and you have 10 bits to determine opcode. Looks enough to me

AndrewAPrice · Post by **AndrewAPrice** » Fri May 23, 2008 4:06 am

Combuster wrote:There's nothing wrong really with a 1k register file. Actually, it could at points get you a significant boost in GFlops.

The main reason is to simply copying data. But then I realised you'd need a register to store where the data should go!

Combuster wrote:What you might want to consider as a nice addition to that is if you want to divide that wide-register into vectorized parts, and add each separately. (instead of adding one 256 bit int, add 16 shorts in one instruction)

Hmm.. parallel processing would extend my already massive instruction set (already up to 89 and I'm not finished the basic algorithmic instructions).

Combuster wrote:but on average I get the idea that the resulting CPU is more suited to do GPU and DSP work rather than standard game logic (where you won't need that huge register size).

I liked the idea of the entire register being one constant piece of memory that can be operated on at any bit-size. Nearly all instructions >64-bit probably won't be used, but someone is bound to find a way to exploit it to do things the CPU was never meant to do.

Combuster wrote:also, performing some operations on 1024-bit registers seem pointless as you don't have enough space left for a second register to add or otherwise work with...
and do you keep a special register for the stack?

Yeah

There could add a second 1024-bit register in the future, but I can see some design problems (I will still be limited to only 64 bit registers due to format of address registers).

The special registers (stack pointer, instruction pointer) are stored separately in their own dedicated 32-bit registers.

Combuster wrote:not to forget: interrupt handling - how do you plan to do that?

An table of address of where to jump to when that interrupt fires. Devices, user code, or CPU can fire an interrupt.

EDIT: I'm changing my system to not have different variations for 16, 32, ... 1024 bit. Currently over 100 instructions and it's mostly for variations on mov, add, sub, etc..

Dex · Post by **Dex** » Fri May 23, 2008 8:05 am

I had the same idea as you many moon ago, but did not get round to working on it yet. But i think a ARM9 would be the place i would go.
So if i get you right you want to code a modern CHIP8
http://en.wikipedia.org/wiki/CHIP-8
See also work by tonyMac
http://dex.7.forumer.com/viewtopic.php?t=272

AndrewAPrice · Post by **AndrewAPrice** » Fri May 23, 2008 8:39 am

Dex wrote:I had the same idea as you many moon ago, but did not get round to working on it yet. But i think a ARM9 would be the place i would go.
So if i get you right you want to code a modern CHIP8

Exactly, but also because it's fun. Sadly there are only a few games written for the CHIP8 since it is a very limiting system.

My CPU design is nearly complete (few minor things to sort out like interrupts) and I've reduce the instruction set (whether the operation is 8, 16, 32 etc bit depends on the register used).

My register layout is slightly more complicated now. There is the 1k register that is recursively divided down into 16 bit registers, and now there is also a 32 bit register that is divided down into 8 bits. This overcomes a few problems: you can work with 8 bit registers (it was impractical to divide the 1k register into 128x 8 bit registers), and you can put stuff into the 1k register while having the 32 bit register for storing addresses in (e.g. when copying data).

I've realised I need to write my own assembler!

I was wondering if a kind of BIOS/Firmware/OS (whatever you want to call it) like a lot of consoles have today would be needed.
Pros:
- I could put the networking and file system code in the firmware to make it easier for end-programmers. If a new FS or network protocol is implemented only the firmware needs to be updated.
Cons:
- Takes away from the raw beauty of doing everything yourself. Each game can decide by itself how to handle memory layouts, network access, etc (common routines could be provided in a library).

Osbios · Post by **Osbios** » Fri May 23, 2008 8:51 am

I have started this project some time ago. https://sourceforge.net/projects/icaf/

Currently I don't work on it. But its already usable. There are win binary and the source also works on linux(unix?).

Note: There is no copy/past and the most important module for programs (the memory block) is still waiting for my developer fingers...

I think thats where you have to begin with CPU development.

iammisc · Post by **iammisc** » Fri May 23, 2008 11:13 am

You said that you're making new variations on instructions for specifying the bit sizes, but why don't you come up with a standard way to specify sizes instead of making new opcodes. For example, you could use 3 bits to specify the size. Then you wouldn't need to add new opcodes.

Let's say the add instruction has opcode 1. Then the instruction

add i64 %r1, %r2

could still use the same opcode as

add i32 %r1, %r2

but the size field will have different values.

Dex · Post by **Dex** » Fri May 23, 2008 12:48 pm

As far as assemblers go, you could try something similar to how Fasm was use to make FasmArm, just by changing one or two .inc files
http://arm.flatassembler.net/

PS: @MessiahAndrw, If your looking for a test bed platforum your more than welcome to use DexOS.

AndrewAPrice · Post by **AndrewAPrice** » Fri May 23, 2008 6:12 pm

iammisc wrote:Let's say the add instruction has opcode 1. Then the instruction

add i64 %r1, %r2

could still use the same opcode as

add i32 %r1, %r2

but the size field will have different values.

I have gotten around that problem with a single opcode. The data copied depends on the destination register. If you copy a larger register to a smaller one, the end gets trimmed. If you copy a smaller register to a larger all bits to the left are zeroed.

kmcguire · Post by **kmcguire** » Fri May 23, 2008 8:03 pm

MessiahAndrw wrote: I've realized I need to write my own assembler! I was wondering if a kind of BIOS/Firmware/OS (whatever you want to call it) like a lot of consoles have today would be needed.
Pros:
- I could put the networking and file system code in the firmware to make it easier for end-programmers. If a new FS or network protocol is implemented only the firmware needs to be updated.
Cons:
- Takes away from the raw beauty of doing everything yourself. Each game can decide by itself how to handle memory layouts, network access, etc (common routines could be provided in a library).

You would _need_ some type of firmware that could boot the system into a ready mode by loading a starting screen for the user. The loading could be done from the network or a storage device (flash, platter).

The inclusion of library like routines for common things such as network access (protocols) or other things seems like something you could just _completely_ ignore right now and come back to later if need be since you just need to get the system designed, but if you did. I suppose you would need to mainly ask if you have enough room in the EEPROM(firmware memory) for that section of library? And, if you did then do you have the most useful sections in the library included for developers to take advantage of.