Essentia. First rough specifications.
Essentia. First rough specifications.
The last time I have been working on a lightweight replacement for the LLVM compiler infrastructure. Specifically, I want to create a bytecode specification, that defines a set of abstract operations, so a compiler would be able to generate single output for different architectures. I'm currently working on a translator from this bytecode into the x86_64 assembly. To make up my mind I decided to document it. So, these documents are the first rough parts of the specification.
I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.
PS. These files are actually written in Markdown, but the forum software doesn't like the extension ".md" (wat?).
Edit: Small edit in the MEMCPY data table.
I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.
PS. These files are actually written in Markdown, but the forum software doesn't like the extension ".md" (wat?).
Edit: Small edit in the MEMCPY data table.
- Attachments
-
- Instruction Set.txt
- (4.2 KiB) Downloaded 138 times
-
- Architecture.txt
- (2.14 KiB) Downloaded 126 times
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay
- AndrewAPrice
- Member
- Posts: 2299
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Re: Essentia. First rough specifications.
Looks good. Is there a particular reason you decided to use a register machine over a stack machine for your bytecode? I personally found a stack machine easier to generate bytecode from an AST. It's also easy to turn a stack into SSA. I'm interested to hear from the other.
My OS is Perception.
Re: Essentia. First rough specifications.
To be honest, there were no reason, why I decided to use this model, it just seemed sane and was already used by high-level languages.MessiahAndrw wrote:Is there a particular reason you decided to use a register machine over a stack machine for your bytecode?
I think, optimization will be done by scanning code of each procedure in the reverse order. Useful variables (used in computation of other useful variables, passed to other procedures as arguments or returned to the caller) will be marked, operations with useless variables (not marked before) will be removed (or replaced with NOPs).MessiahAndrw wrote:It's also easy to turn a stack into SSA.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay
Re: Essentia. First rough specifications.
A bit about english - may be it is more correct to ask instead of requesting something?Roman wrote:I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.
About bytecode. I see the simplicity as a very serious advantage of a bytecode. There's still too much complexity in your variant. I recommend to leave only one conditional jump. Also I recommend to remove call and return. And for some time it would be great to remove floating point and signed math. So, it will be just 5-6 words for the whole bytecode body. Or is it possible to get down to 3-4?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability
Re: Essentia. First rough specifications.
I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.The CALL instruction also preserves CPU registers, according to the target calling convention.
I don't think you want your register variable behavior to be different between different platforms.
Also, how does CALL_SYS work? And, how do you reference external labels/addresses/methods/etc?
Nice work so far, though.
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Re: Essentia. First rough specifications.
Why? A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected".SpyderTL wrote:I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.
I don't think you want your register variable behavior to be different between different platforms.
The same way CALL does, but it does a system call (for example, SYSCALL on AMD64 or INT 0x80 on Linux/i386). The destination field is the index of the variable, that contains the system call number.SpyderTL wrote:Also, how does CALL_SYS work?
I will add a new instruction - IMPORT - it will leave something like "extern symname". symname will be read by its index in the symbol table.SpyderTL wrote:And, how do you reference external labels/addresses/methods/etc?
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay
Re: Essentia. First rough specifications.
How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?embryo2 wrote:I recommend to leave only one conditional jump.
How could that even work?embryo2 wrote:Also I recommend to remove call and return.
I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.embryo2 wrote:And for some time it would be great to remove floating point and signed math.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay
Re: Essentia. First rough specifications.
This is why I recommend removing it. "According to the target calling convention" made me think that some registers would be preserved, and some may not.Roman wrote:A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected"
Maybe replace the statement with "all variable values are preserved during method calls" to clarify that the platform calling conventions do not have any effect on the behavior of any variables, even ones stored in registers.
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Re: Essentia. First rough specifications.
It wouldn't. It is the compiler who actually injects other kinds of jumps and there could be no code difference in case of a smart compiler processing your bytecode with one conditional jump or whatever other number of jumps.Roman wrote:How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?embryo2 wrote:I recommend to leave only one conditional jump.
Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?Roman wrote:How could that even work?embryo2 wrote:Also I recommend to remove call and return.
Yes, there are some ways of representing things in a different manner, but first of all a lot of instructions just distract your attention and as a result you'll got a mediocre solution.Roman wrote:I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.embryo2 wrote:And for some time it would be great to remove floating point and signed math.
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability
Re: Essentia. First rough specifications.
I'm envious of people of (roughly) my age doing such an interesting and complicated things (examples: Roman, 0fb1d8, omarrx024, et al.), while my poor and dumb head is still stuck in physical memory management.
Re: Essentia. First rough specifications.
I understand, how these commands work (at least on x86). On x86 CALL pushes the return address onto the stack, then loads (R/E)IP with the value. RET pops it from the stack and loads the (R/E)IP with it. Are you suggesting me to store the return address in a different way?embryo2 wrote:Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
- Alan Kay
Re: Essentia. First rough specifications.
I suggest a VM design decision. The bytecode is useless without VM or compiler, so, you have to design a way of running your bytecode. And the way includes the call algorithm implementation. The details about the exact code behavior are up to you, while the most important thing here is the ability to implement the details yourself.Roman wrote:Are you suggesting me to store the return address in a different way?
Of course, it is possible to describe the algorithm, but do you want to implement my algorithm?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability