Essentia. First rough specifications.

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Essentia. First rough specifications.

Post by Roman »

The last time I have been working on a lightweight replacement for the LLVM compiler infrastructure. Specifically, I want to create a bytecode specification, that defines a set of abstract operations, so a compiler would be able to generate single output for different architectures. I'm currently working on a translator from this bytecode into the x86_64 assembly. To make up my mind I decided to document it. So, these documents are the first rough parts of the specification.

I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.

PS. These files are actually written in Markdown, but the forum software doesn't like the extension ".md" (wat?).

Edit: Small edit in the MEMCPY data table.
Attachments
Instruction Set.txt
(4.2 KiB) Downloaded 138 times
Architecture.txt
(2.14 KiB) Downloaded 126 times
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: Essentia. First rough specifications.

Post by AndrewAPrice »

Looks good. Is there a particular reason you decided to use a register machine over a stack machine for your bytecode? I personally found a stack machine easier to generate bytecode from an AST. It's also easy to turn a stack into SSA. I'm interested to hear from the other.
My OS is Perception.
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Essentia. First rough specifications.

Post by Roman »

MessiahAndrw wrote:Is there a particular reason you decided to use a register machine over a stack machine for your bytecode?
To be honest, there were no reason, why I decided to use this model, it just seemed sane and was already used by high-level languages.
MessiahAndrw wrote:It's also easy to turn a stack into SSA.
I think, optimization will be done by scanning code of each procedure in the reverse order. Useful variables (used in computation of other useful variables, passed to other procedures as arguments or returned to the caller) will be marked, operations with useless variables (not marked before) will be removed (or replaced with NOPs).
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Essentia. First rough specifications.

Post by embryo2 »

Roman wrote:I request you to look through that. Architecture.md is recommended to be read first. Perhaps you'll spot some bad design decisions.
A bit about english - may be it is more correct to ask instead of requesting something?

About bytecode. I see the simplicity as a very serious advantage of a bytecode. There's still too much complexity in your variant. I recommend to leave only one conditional jump. Also I recommend to remove call and return. And for some time it would be great to remove floating point and signed math. So, it will be just 5-6 words for the whole bytecode body. Or is it possible to get down to 3-4?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Essentia. First rough specifications.

Post by SpyderTL »

The CALL instruction also preserves CPU registers, according to the target calling convention.
I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.

I don't think you want your register variable behavior to be different between different platforms.

Also, how does CALL_SYS work? And, how do you reference external labels/addresses/methods/etc?

Nice work so far, though.
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Essentia. First rough specifications.

Post by Roman »

SpyderTL wrote:I think I would remove this statement, entirely. Or at least, replace it with the defined, cross-platform behavior.

I don't think you want your register variable behavior to be different between different platforms.
Why? A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected".
SpyderTL wrote:Also, how does CALL_SYS work?
The same way CALL does, but it does a system call (for example, SYSCALL on AMD64 or INT 0x80 on Linux/i386). The destination field is the index of the variable, that contains the system call number.
SpyderTL wrote:And, how do you reference external labels/addresses/methods/etc?
I will add a new instruction - IMPORT - it will leave something like "extern symname". symname will be read by its index in the symbol table.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Essentia. First rough specifications.

Post by Roman »

embryo2 wrote:I recommend to leave only one conditional jump.
How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?
embryo2 wrote:Also I recommend to remove call and return.
How could that even work?
embryo2 wrote:And for some time it would be great to remove floating point and signed math.
I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Essentia. First rough specifications.

Post by SpyderTL »

Roman wrote:A compiler is entirely agnostic about the target CPU's registers, "preserving CPU registers" means "your variables will not be affected"
This is why I recommend removing it. "According to the target calling convention" made me think that some registers would be preserved, and some may not.

Maybe replace the statement with "all variable values are preserved during method calls" to clarify that the platform calling conventions do not have any effect on the behavior of any variables, even ones stored in registers.
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Essentia. First rough specifications.

Post by embryo2 »

Roman wrote:
embryo2 wrote:I recommend to leave only one conditional jump.
How? Wouldn't that remove advantages of architectures, that have different kinds of conditional jumps?
It wouldn't. It is the compiler who actually injects other kinds of jumps and there could be no code difference in case of a smart compiler processing your bytecode with one conditional jump or whatever other number of jumps.
Roman wrote:
embryo2 wrote:Also I recommend to remove call and return.
How could that even work?
Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?
Roman wrote:
embryo2 wrote:And for some time it would be great to remove floating point and signed math.
I am still thinking about this. Differentiating different kinds of math could be done on the variable type level.
Yes, there are some ways of representing things in a different manner, but first of all a lot of instructions just distract your attention and as a result you'll got a mediocre solution.
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
User avatar
Muazzam
Member
Member
Posts: 543
Joined: Mon Jun 16, 2014 5:59 am
Location: Shahpur, Layyah, Pakistan

Re: Essentia. First rough specifications.

Post by Muazzam »

I'm envious of people of (roughly) my age doing such an interesting and complicated things (examples: Roman, 0fb1d8, omarrx024, et al.), while my poor and dumb head is still stuck in physical memory management. :(
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Essentia. First rough specifications.

Post by Roman »

embryo2 wrote:Removal of those commands will help you understand them better. Your question tells me that you just don't understand how call and return work. But how will you do a compiler without such understanding?
I understand, how these commands work (at least on x86). On x86 CALL pushes the return address onto the stack, then loads (R/E)IP with the value. RET pops it from the stack and loads the (R/E)IP with it. Are you suggesting me to store the return address in a different way?
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Essentia. First rough specifications.

Post by embryo2 »

Roman wrote:Are you suggesting me to store the return address in a different way?
I suggest a VM design decision. The bytecode is useless without VM or compiler, so, you have to design a way of running your bytecode. And the way includes the call algorithm implementation. The details about the exact code behavior are up to you, while the most important thing here is the ability to implement the details yourself.

Of course, it is possible to describe the algorithm, but do you want to implement my algorithm?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
Post Reply