compiled or interpreted

sandras · Post by **sandras** » Mon Apr 30, 2012 11:18 am

For a long time I was considering interpreted user land for my OS and thought, that I'm going to go that way probably. But nearing the point where the ways part to the compiled or interpreted land, I find myself not so sure about all this.

For example, I know, this is far ahead of time, but I would like my OS to be able to run on as small machines as possible, as well, as big ones. My concern is the small ones. How mush of overhead, in terms of kernel size, would the interpreter / JIT compiler add compared to usual memory management, scheduling, etc.? Lua is a lightweight language and it's interpreter is 172kb. And some people's kernel here don't pass 50kb.

Also I was thinking, that in case I go for interpreted user land, I can have appropriate memory management. I could have a pool of memory blocks, that way, removing external memory fragmentation. If there should be allocated more memory tha the size of memory block, there simply would be a link to the next block at the end of the first one. This would all be managed by the kernel, not user land. See, you could tune the size of blocks at compile time, and the maximum wasted amount of space for a memory allocation would be sizeof(memory_block) - 1. This is possible, because I think, I could make the interpreted executable's position independent. But this would require the executable's to be purely interpreted, not JIT compiled. Also, bloc allocation with stack could be really quick.

Also, I think, that it is not possible to make the interpreted code run faster than (JIT) compiled one. But what about size.. I could make a CISC'y VM, which would save executable's size. At some point, when the system grows, the size savings on user land interpreted executable's, could outweigh the overhead of the interpreter. The thing I don't like about this, that I would not like to have a complex VM, and it would also further increase the size of the kernel. On the other hand, I imagine, the VM would be just a giant switch table, and then a bunch of relatively simple functions.

Then there's caching. I suppose, the small interpreted executable's would not hurt cache performance. and there's only one compiled executable - the kernel and I think most of the time, most of it would be cached. But there are bad things. The allocation of memory in blocks. I would like to ask, what is the average size of a memory allocation in traditional systems? Maybe that would be the best size for a memory block, if it's not too big, or maybe a half or a quarter of it. Again, the memory block size is tune-able, and that is the good thing about it.

Then there's portability. Some parts of memory management of systems with compiled user land are not portable. I suppose JIT is not as portable as simply an interpreter too, as it is architecture specific.

Interpretation and JIT also offers to have memory protection even on MMU'less machines, which is nice, I think. Not even so much because of protection, but because the system works the same way regardless of the machine having MMU or protection rings.

The there's the point, that I think, switching tasks would be fast. Is it true, that having tasks switch frequently, makes it appear, as if the system is more responsive? Cus that would be more feasible with tasks, that can switch quickly.

Are these last six points worth making it purely interpreted? I know it's about preference, but state your opinion. You know, is it completely stupid, or you just would not want these trade-off's.

As you may see, I'm interested in what benefits I can get from purely interpreted user land, more than about JIT. When thinking about how I would make the kernel interact with user land, interpretation seemed natural to me. That's why I thought of those six points.

bluemoon · Post by **bluemoon** » Mon Apr 30, 2012 11:57 am

Sandras wrote:For a long time I was considering interpreted user land for my OS and thought, that I'm going to go that way probably. But nearing the point where the ways part to the compiled or interpreted land, I find myself not so sure about all this.

Just curious, why not have the interpreter on userland as well?
if you want to protect the system by only let user to execute scripts, it is the wrong way to do it. Unless the script provide no useful function at all.

Sandras wrote: For example, I know, this is far ahead of time, but I would like my OS to be able to run on as small machines as possible, as well, as big ones. My concern is the small ones. How mush of overhead, in terms of kernel size, would the interpreter / JIT compiler add compared to usual memory management, scheduling, etc.? Lua is a lightweight language and it's interpreter is 172kb. And some people's kernel here don't pass 50kb.

The line is clear. You either limit the target machine as 8086 which has very little ram, or you aim at 386/486 which has a few megs, or 586+ with more than hundreds.
So, for machine prior to 386+ it's too slow to run script, and with 586+ you really don't need to worry on 172Kb usage.

Sandras wrote: Also I was thinking, that in case I go for interpreted user land, I can have appropriate memory management. I could have a pool of memory blocks, that way, removing external memory fragmentation. If there should be allocated more memory tha the size of memory block, there simply would be a link to the next block at the end of the first one. This would all be managed by the kernel, not user land. See, you could tune the size of blocks at compile time, and the maximum wasted amount of space for a memory allocation would be sizeof(memory_block) - 1. This is possible, because I think, I could make the interpreted executable's position independent. But this would require the executable's to be purely interpreted, not JIT compiled. Also, bloc allocation with stack could be really quick.

Either you or me misunderstood JIT. AFAIK there are two kind of interpreter:
1. run each line of source (eg. shell scripts)
2. JIT the source into intermediate code at once(hence slower startup time), and interpret that intermediate code(so don't need to parse the source again). JIT also make further optimization possible by refactoring the object tree.

Sandras wrote: Also, I think, that it is not possible to make the interpreted code run faster than (JIT) compiled one. But what about size.. I could make a CISC'y VM, which would save executable's size. At some point, when the system grows, the size savings on user land interpreted executable's, could outweigh the overhead of the interpreter. The thing I don't like about this, that I would not like to have a complex VM, and it would also further increase the size of the kernel. On the other hand, I imagine, the VM would be just a giant switch table, and then a bunch of relatively simple functions.

Are you comparing with "JIT to native code"? Now that's the difference for interpreter and compiler: JIT compile to object code and interpret it, or JIT compile to native code and run directly. JIT literally means "do it on run".

NickJohnson · Post by **NickJohnson** » Mon Apr 30, 2012 12:01 pm

bluemoon wrote:
Sandras wrote: Also I was thinking, that in case I go for interpreted user land, I can have appropriate memory management. I could have a pool of memory blocks, that way, removing external memory fragmentation. If there should be allocated more memory tha the size of memory block, there simply would be a link to the next block at the end of the first one. This would all be managed by the kernel, not user land. See, you could tune the size of blocks at compile time, and the maximum wasted amount of space for a memory allocation would be sizeof(memory_block) - 1. This is possible, because I think, I could make the interpreted executable's position independent. But this would require the executable's to be purely interpreted, not JIT compiled. Also, bloc allocation with stack could be really quick.
Either you or me misunderstood JIT. AFAIK there are two kind of interpreter:
1. run each line of source (eg. shell scripts)
2. JIT the source into intermediate code at once(hence slower startup time), and interpret that intermediate code(so don't need to parse the source again). JIT also make further optimization possible by refactoring the object tree.

JIT usually refers to converting the source all the way into native code at runtime. What you're describing is a bytecode interpreter.

gravaera · Post by **gravaera** » Mon Apr 30, 2012 12:12 pm

Yo:

I am being 100% brazenfaced when I say I didn't read a single line of the OP's post.

My judgement on this matter is: Go compiled. Compile everything. Thank me later

--That is all
gravaera

Rudster816 · Post by **Rudster816** » Mon Apr 30, 2012 12:31 pm

There is quite a severe performance penalty associated with purely interpreting something versus 'JITing' it. You only ever have to JIT a section of code one time, and then any penalty introduced by the compiler will be gone. When you interpret something it must be run through the interpreter every time its executed, which for general purpose programming is terrible. A program that is very CPU intense could run orders of magnitude slower on your OS versus others because it will have the overhead of the interpreter. Even the fastest interpreter in the world will introduce completely unacceptable overhead's for certain applications, so I don't really see them being viable for doing a 'managed userland'.

http://luajit.org/performance_x86.html

That page clearly shows that interpreting Lua can be over 100 times slower than JITing it for certain applications.

That project looks really promising for you. Poking around a bit, I think it should be fairly simple to port, and it looks like it was tailor made for you.

It combines high flexibility with high performance and an unmatched low memory footprint: less than 125K for the VM plus less than 85K for the JIT compiler (on x86).

Plus it can target x86\x64, ARM, MIPS, and PPC. It's even under a non restrictive (MIT) license.

NickJohnson wrote: JIT usually refers to converting the source all the way into native code at runtime. What you're describing is a bytecode interpreter.

Both native implementations of Java and .NET are bytecode, but they are both described by their owners (Oracle\Microsoft) as being JIT'd. I think the accepted definition of JIT would be that whatever is being run is converted from its "native" format (whatever it may be, source\bytecode\etc) into the running machines native language (e.g. x86 Machine Code).

Brendan · Post by **Brendan** » Mon Apr 30, 2012 1:16 pm

Hi,

I think this is more of a software distribution issue. If people distribute native binaries, then the binary can't be tuned/optimised for all the different CPUs at the same time and will only have generic optimisations. It's also impossible to have portable binaries - e.g. if you've got a 64-bit 80x86 executable then you won't be able to run it on 32-bit 80x86 or 64-bit ARM or...

One alternative is distributing source code. This fails for most commercial software where companies spend huge amounts of $$ to create the software and hope to get at least some of that cash back. The "compile it yourself from source" idea isn't very nice for normal end users either. The "automated compile during install" is easier for end users, but it's still a horrible idea because over time libraries, tools, etc change and you end up with a big mess to cope with that - "autoconf", layers of scripts, tools, dependencies, etc (trust me on this, I'm using Gentoo).

In my opinion; the best alternative is to distribute byte-code. It avoids all of the problems with distributing binaries and (almost) all of the problems with distributing source code. This mean that software developers compile their source code to a strict/standardised byte-code and distribute a "big blob of byte-code" to end users.

So, what happens with the "big blob of byte-code" after that? You could use an interpreter or JIT to execute it, or you could compile it to native code when it's being installed. You could even support several methods on the same architecture; or use different methods on different architectures. I'd probably compile it to native code when it's being installed because compiling byte-code should be relatively fast (all the parsing and error checking has already been done, and probably a lot of the high-level optimisations too) and it's better for start-up times, performance in general and memory usage (e.g. no interpreter, JIT or virtual machine in memory while the application is being executed).

However, an interpreter is easier to write, and it's easy to add powerful debugging features; so during early development (e.g. while I'm trying to get the "source code to byte-code compiler" stable) I'd probably just hack together a simple byte-code interpreter/debugger (and then do a decent "byte-code to native" compiler later).

Cheers,

Brendan

NickJohnson · Post by **NickJohnson** » Mon Apr 30, 2012 6:45 pm

Rudster816 wrote:
NickJohnson wrote: JIT usually refers to converting the source all the way into native code at runtime. What you're describing is a bytecode interpreter.
Both native implementations of Java and .NET are bytecode, but they are both described by their owners (Oracle\Microsoft) as being JIT'd. I think the accepted definition of JIT would be that whatever is being run is converted from its "native" format (whatever it may be, source\bytecode\etc) into the running machines native language (e.g. x86 Machine Code).

Sorry, my point was to emphasize that JIT usually implies native code is being run at the end. i.e. (source -> native at runtime) or (bytecode -> native at runtime) count as JIT, but (source -> bytecode at runtime) is a type of bytecode interpreter IMO, because otherwise you'd call things like the Python interpreter JIT.

AndrewAPrice · Post by **AndrewAPrice** » Mon Apr 30, 2012 7:19 pm

You could do a combination of both if you wish. All of the user applications and drivers in my OS are written in byte code. There should be no way for a user application to run user-specified native code on the CPU.

My motivation for using byte code may be different to yours. I envisioned a system where IPC is cheap and free - nothing more than sharing interfaces that represent entry points into other processes, where objects can be shared between drivers, servers, and user applications. To achieve this goal I had to eliminate the MMU so that all processes exist in the same user space. But, that meant that native code running on the CPU could access any memory in any process, so to avoid that I chose a high-level 'safe' bytecode approach - similar to that taken by Java and .Net. My operating system 'kernel' is merely a virtual machine for the bytecode to run on.

For the sake of drivers there are also 'unsafe' opcodes (such as port I/O and pointers) but these are only allowed in drivers and any function containing unsafe opcodes MUST be marked as 'unsafe'. (This is the only exploit in the system I know of that could be used to run code natively on it.)

An advantage of this bytecode approach is that my 'kernel' (VM) also compiles as a standard Windows application - so I can run my OS's executables in Windows - in the same way that you can run Java programs in Windows. (I just have to be sure to catch things like file access and display management and pass them through Windows instead of trying to do hardware I/O.)

Anyway, at the moment I haven't developed a JIT compiler and so all of my applications and drivers are interpreted. Surprisingly, interpreting bytecode isn't as slow as I first thought it would be.

As a quick test I performed early on in development, I wrote some code that quick sorted a large array of random data. It took about 2 seconds running natively on my CPU, but about 20 seconds to sort the same data using the same algorithm in bytecode (including bounds testing on every array access). The same ray casting technique that draws at 1,200 fps natively draws at around 100 fps in bytecode.

From that I've made the assumption that interpreting bytecode is approximately 10x slower than running it natively - but it's dangerous to make an assumption from so few tests like I did because it depends on many factors - I'm sure some operations (such as dereferencing an object) may be 20x slower (type checking, bounds checking, etc) while some (such as adding two integers) may be only 5x slower.

But I have been working on the assumption that by interpreting all of my user applications and drivers, my 3GHz CPU is effectively equivalent to a 300MHz CPU - which is more than enough for all of my GUI applications and simple games to be as responsive as ever. That is why JIT compiling my bytecode is kind of low priority for me right now. I guess I would want a functional JIT compiler when I move on to more extensive processing - such as real-time MPEG decoding - but that can wait!

The other downside of a safe bytecode approach - you can't re-use all of the great C/C++ libraries out there! That means there will be no GCC, no VIM, no freetype2, etc running on my OS for a long time to come (solution).

sandras · Post by **sandras** » Tue May 01, 2012 2:34 am

First of all, thanks for the replies, everyone.

One thing I don't like, is code duplication. And that's why I did not like compiled user land - think kmalloc + user space malloc. Since interpretation is not an option, in the long run, at least, one would look into JIT compilation. But code duplication wise JIT compilation is worse, than compiled user land - you compile to byte code, you JIT, but then there will inevitably be some assembler, which you'll have to assemble, and most likely C, as you don't want to write your JIT compiler in assembler, I think.

Another argument, though I could have made it in the first post, is that there are more languages in the bytecode user land. This is also a form of duplication.

In general, JIT brings complexity, IMO, which is also not what I want.

Having read your posts and given more thought to it the last day, I think the right trade-off's for me are in the compiled user land model.

gravaera · Post by **gravaera** » Tue May 01, 2012 6:59 am

Smart man! I can already see great promise in your project...revolutionary!

sandras · Post by **sandras** » Tue May 01, 2012 7:11 am

gravaera wrote:Smart man! I can already see great promise in your project...revolutionary!

Yeah...right! : D

AndrewAPrice · Post by **AndrewAPrice** » Tue May 01, 2012 7:57 am

Sandras wrote:One thing I don't like, is code duplication. And that's why I did not like compiled user land - think kmalloc + user space malloc.

I see. I kind of get ease this burden by loading reference libraries just once in memory - but my reasoning for doing so is different (static variables in shared libraries is the basis of IPC in my OS). On the other hand, I also inline small functions for performance reasons (inlining properties, SetPixel functions, array access functions, etc makes it easier to optimise) which increases code duplication.

sandras · Post by **sandras** » Tue May 01, 2012 12:41 pm

MessiahAndrw wrote:
Sandras wrote:One thing I don't like, is code duplication. And that's why I did not like compiled user land - think kmalloc + user space malloc.
On the other hand, I also inline small functions for performance reasons (inlining properties, SetPixel functions, array access functions, etc makes it easier to optimise) which increases code duplication.

Maybe I should put it differently. I should have called it component duplication or something like that. It's the thought, that "I have too many compilers", that bugs me, not that several lines will be repeated. The JIT compiler and C compiler meant to compile the JIT compiler may not have a single line in common. I suppose I want one component for one job. And I do believe in inlining for performance. I think there are even cases, where the function does so little, that pushing and popping not only slow down, but increase size, so it's a lose/lose situation, even if you use the function in many places.

AndrewAPrice · Post by **AndrewAPrice** » Tue May 01, 2012 1:11 pm

Makes more sense. I also agree with you with duplicating components - but I haven't found a solution to avoid it.

As an example, imagine the representation of a program in memory in source code level, bytecode level, and runtime level. At all levels, there are 'namespaces' made up of 'classes' which are made up of 'variables' and 'methods'.

Now, when I compile my code, I have a bunch of structs in my compiler for representing that a 'namespace' is made up for 'classes' which are made up of 'functions'. 'Functions' contain abstract syntax trees. This generates an assembly file.

As my assembler parses this, it too has a 'namespace' struct that contains 'classes' which contains 'functions'. 'Functions' this time contain a binary blob of bytecode. This generates a binary file.

When the kernel/runtime loads the binary file in, it expands it into it's own 'namespace' struct that contains 'classes' which contain 'functions'. This still contains bytecode (machine code later) but references to types (unique for the entire system) are hardcoded into the bytecode. This gets executed.

This feels like a lot of duplication - because I'm coding the structs and classes that represent the same basic layout of a program 3 times. I thought about merging this - but decided against it because each one is too unique:
* The compiler is high level using std::unordered_map, std::string, etc.
* The assembler stores things in terms of indices in a string table and type table.
* The runtime uses basic arrays and linked lists (keeping the kernel code as small as possible so no STL), stores platform dependent information about which registers are used in functions, byte sizes of classes, types are given unique indices shared across the entire system, etc.
* The number of unique primitives - namespaces, classes, types, structs, functions, properties, methods, and variables is actually small enough to maintain separately.

The other thing that makes component duplication harder is the fact that I want my compiler and assembler to be ran from inside of my OS one day, this creates a catch 22 situation:
* I cannot develop the assembler in an assembly language that does not yet have an assembler.
* If I develop the tools in C++ (as I am doing now), I will need to rewrite the tools as I don't allow native user code to run in my OS.

I haven't decided of the approach I'm going to take but I have two options:
* Develop two assemblers. A C++ one for now, and once basic functionality is implemented, use it to develop another assembler. Develop the high level compiler in this assembly language. (But lexical parsing in assembly? /shudder )

* Develop the assembler in C++ using only features that are common in both C++ and my language to enable easy source-to-source translation later. Develop the high level compiler in the same manner. Once the high level compiler is functional, do a direct source to source translation between the two languages. Continue development of the assembler and high level compiler in my high level language.

* Embed the C++ assembler into the kernel. (I thought about this too.... In fact I thought about embedding the high-level compiler into the kernel so you can execute user-specified strings of code at runtime.)

I'm aiming towards the second option since I like the idea of self hosting tools.

AndrewAPrice · Post by **AndrewAPrice** » Tue May 01, 2012 1:45 pm

berkus wrote:Enter LLVM.

Where different frontends (C, Lua, Fortran, whatever) compile into the same bytecode representation, from which a single optimizing translator can generate code for nearly any architecture.

But that's less fun than DIY.

OSDev.org

compiled or interpreted

compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted

Re: compiled or interpreted