OSDev.org

Posted: **Thu Aug 20, 2015 10:48 am**

The last time I have been working on a lightweight replacement for the LLVM compiler infrastructure. Specifically, I want to create a bytecode specification, that defines a set of abstract operations, so a compiler would be able to generate single output for different architectures. I'm currently working on a translator from this bytecode into the x86_64 assembly. To make up my mind I decided to document it. So, these documents are the first rough parts of the specification.

I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.

PS. These files are actually written in Markdown, but the forum software doesn't like the extension ".md" (wat?).

Edit: Small edit in the MEMCPY data table.

Posted: **Thu Aug 20, 2015 2:06 pm**

Looks good. Is there a particular reason you decided to use a register machine over a stack machine for your bytecode? I personally found a stack machine easier to generate bytecode from an AST. It's also easy to turn a stack into SSA. I'm interested to hear from the other.

Posted: **Thu Aug 20, 2015 3:22 pm**

MessiahAndrw wrote:Is there a particular reason you decided to use a register machine over a stack machine for your bytecode?

To be honest, there were no reason, why I decided to use this model, it just seemed sane and was already used by high-level languages.

MessiahAndrw wrote:It's also easy to turn a stack into SSA.

I think, optimization will be done by scanning code of each procedure in the reverse order. Useful variables (used in computation of other useful variables, passed to other procedures as arguments or returned to the caller) will be marked, operations with useless variables (not marked before) will be removed (or replaced with NOPs).

Posted: **Fri Aug 21, 2015 3:49 am**

Roman wrote:I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.

A bit about english - may be it is more correct to ask instead of requesting something?

About bytecode. I see the simplicity as a very serious advantage of a bytecode. There's still too much complexity in your variant. I recommend to leave only one conditional jump. Also I recommend to remove call and return. And for some time it would be great to remove floating point and signed math. So, it will be just 5-6 words for the whole bytecode body. Or is it possible to get down to 3-4?

Posted: **Fri Aug 21, 2015 8:25 am**

The CALL instruction also preserves CPU registers, according to the target calling convention.

I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.

I don't think you want your register variable behavior to be different between different platforms.

Also, how does CALL_SYS work? And, how do you reference external labels/addresses/methods/etc?

Nice work so far, though.

Posted: **Fri Aug 21, 2015 11:21 am**

SpyderTL wrote:I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.

I don't think you want your register variable behavior to be different between different platforms.

Why? A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected".

SpyderTL wrote:Also, how does CALL_SYS work?

The same way CALL does, but it does a system call (for example, SYSCALL on AMD64 or INT 0x80 on Linux/i386). The destination field is the index of the variable, that contains the system call number.

SpyderTL wrote:And, how do you reference external labels/addresses/methods/etc?

I will add a new instruction - IMPORT - it will leave something like "extern symname". symname will be read by its index in the symbol table.

Posted: **Sat Aug 22, 2015 6:13 am**

embryo2 wrote:I recommend to leave only one conditional jump.

How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?

embryo2 wrote:Also I recommend to remove call and return.

How could that even work?

embryo2 wrote:And for some time it would be great to remove floating point and signed math.

I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.

Posted: **Sat Aug 22, 2015 5:04 pm**

Roman wrote:A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected"

This is why I recommend removing it. "According to the target calling convention" made me think that some registers would be preserved, and some may not.

Maybe replace the statement with "all variable values are preserved during method calls" to clarify that the platform calling conventions do not have any effect on the behavior of any variables, even ones stored in registers.

Posted: **Sun Aug 23, 2015 4:56 am**

Roman wrote:
embryo2 wrote:I recommend to leave only one conditional jump.
How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?

It wouldn't. It is the compiler who actually injects other kinds of jumps and there could be no code difference in case of a smart compiler processing your bytecode with one conditional jump or whatever other number of jumps.

Roman wrote:
embryo2 wrote:Also I recommend to remove call and return.
How could that even work?

Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?

Roman wrote:
embryo2 wrote:And for some time it would be great to remove floating point and signed math.
I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.

Yes, there are some ways of representing things in a different manner, but first of all a lot of instructions just distract your attention and as a result you'll got a mediocre solution.

Posted: **Sun Aug 23, 2015 7:39 am**

I'm envious of people of (roughly) my age doing such an interesting and complicated things (examples: Roman, 0fb1d8, omarrx024, et al.), while my poor and dumb head is still stuck in physical memory management.

Posted: **Sun Aug 23, 2015 8:41 am**

embryo2 wrote:Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?

I understand, how these commands work (at least on x86). On x86 CALL pushes the return address onto the stack, then loads (R/E)IP with the value. RET pops it from the stack and loads the (R/E)IP with it. Are you suggesting me to store the return address in a different way?

Posted: **Mon Aug 24, 2015 5:19 am**

Roman wrote:Are you suggesting me to store the return address in a different way?

I suggest a VM design decision. The bytecode is useless without VM or compiler, so, you have to design a way of running your bytecode. And the way includes the call algorithm implementation. The details about the exact code behavior are up to you, while the most important thing here is the ability to implement the details yourself.

Of course, it is possible to describe the algorithm, but do you want to implement my algorithm?

OSDev.org

Essentia. First rough specifications.

Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.

Re: Essentia. First rough specifications.