Hi there!
Well an idea popped up in my head last week. Would it be possible to scan the machine code of all and every binary loaded by the kernel (not assembly code and not some intermediate code, real machine code), to verify that it doesn't use any priviledged instructions, doesn't modify certain registers, doesn't jump to foreign program code, doesn't dereference pointers to other programs, etc...
So in the end it would be possible to guarantee the safety of a given binary. Then it would be possible to run all programs in one address space, in ring0 and it would certainly allow for very fast message passing.
Singularity uses this too, but with an intermediate language...I would like to implement this same behaviour with normal machine code, to allow safety-checked legacy programs to run (surely there will have to be certain runtime checks too, so the binaries need to be modified a bit...).
Hm....well, what do you think about this?
Verification of machine code
- singularity
- Posts: 5
- Joined: Wed Jul 11, 2007 7:04 am
- Location: Austria
- Contact:
Interesting, but incredibly difficult, which is probably one of the main reasons Singularity attempts to guarantee type safety from an intermediate language rather than machine code. Some of the things you suggest (preventing use of privileged instructions, doesn't modify certain registers - not entirely sure what you mean here but I guess you mean things like cr0 and cr3) are provided for by the protection mechanism of the i386 processor. The rest can basically be summed up as: data accesses by a process can only be made to preset parts of the image labeled data and can also not overflow any particular section within that region that they are writing to, and instruction flow is limited to those parts defined as methods/functions, with strictly defined rules as to how to transition between different parts of code.
This is where the difficulty comes in. Certainly, you can limit data accesses to a .data segment, a .bss segment, a heap and a stack, but what if you want to allow self-modifying code? Also, how do you prevent overflowing an array, as it is difficult to determine, from machine code, what is an array and how long it is. Code is a little easier: don't allow jumps to anything outside a process' .text section. This is fine for jumps of the type 'jmp rel8/16/32' or 'jmp ptr...' but what happens when you do something like 'jmp [edx]'? This is quite common in C++ code using vtables. How do you know beforehand what the value of [edx] is going to be?
One of the main reasons for CIL (what Singularity uses as its intermediate code) was that it provided a code that could be verified and yet converted to machine code quickly, as the difficulties of doing it with machine code were all too apparent.
The other issue you have to consider with the "one address space for all processes" method is that it is unrealistic on 32bit machines (the designers of Singularity admit this) as you could conceivably end up sharing 4gb between all your processes and your kernel. Long mode provides much more space, so it is feasible, but you are immediately ruling out a large proportion of the computer systems out there.
Regards,
John.
This is where the difficulty comes in. Certainly, you can limit data accesses to a .data segment, a .bss segment, a heap and a stack, but what if you want to allow self-modifying code? Also, how do you prevent overflowing an array, as it is difficult to determine, from machine code, what is an array and how long it is. Code is a little easier: don't allow jumps to anything outside a process' .text section. This is fine for jumps of the type 'jmp rel8/16/32' or 'jmp ptr...' but what happens when you do something like 'jmp [edx]'? This is quite common in C++ code using vtables. How do you know beforehand what the value of [edx] is going to be?
One of the main reasons for CIL (what Singularity uses as its intermediate code) was that it provided a code that could be verified and yet converted to machine code quickly, as the difficulties of doing it with machine code were all too apparent.
The other issue you have to consider with the "one address space for all processes" method is that it is unrealistic on 32bit machines (the designers of Singularity admit this) as you could conceivably end up sharing 4gb between all your processes and your kernel. Long mode provides much more space, so it is feasible, but you are immediately ruling out a large proportion of the computer systems out there.
Regards,
John.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Being able to verify code depends a lot more on metadata accompanying the code than on the format of the code itself. Bartok (the CIL-to-x86 compiler used by Singularity) verifies CIL, but they plan to have it emit "typed assembly language" in the future that can be verified by an even smaller and simpler verifier, thus removing Bartok from the trusted part of the system. Typed assembly is just machine code along with a proof that the verifier can cross-check against the code itself.
You won't be able to verify a legacy binary unless it comes with some metadata that constitutes a proof of its type safety. This sort of verification really has to be built into the entire system from the beginning. You'll need to use hardware isolation (CPU privilege levels & the MMU) for legacy code (unless your idea of legacy is something like .NET assemblies or Java .class files).
You won't be able to verify a legacy binary unless it comes with some metadata that constitutes a proof of its type safety. This sort of verification really has to be built into the entire system from the beginning. You'll need to use hardware isolation (CPU privilege levels & the MMU) for legacy code (unless your idea of legacy is something like .NET assemblies or Java .class files).
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
- singularity
- Posts: 5
- Joined: Wed Jul 11, 2007 7:04 am
- Location: Austria
- Contact:
Yeah, I know, it would be very difficult...
I've read a few papers about machine code verification and then this idea simply popped up
Jep, I meant registers like cr0 and cr3...and instructions like lidt, ltr, etc.
Oh well, thank you very much for you sharing your opinions with me! I'd really love to discuss a bit more, but I'm very tired at the moment and my head's a real mess now
cheers
I've read a few papers about machine code verification and then this idea simply popped up
Jep, I meant registers like cr0 and cr3...and instructions like lidt, ltr, etc.
Yes, that was the basic idea...The rest can basically be summed up as: data accesses by a process can only be made to preset parts of the image labeled data and can also not overflow any particular section within that region that they are writing to, and instruction flow is limited to those parts defined as methods/functions, with strictly defined rules as to how to transition between different parts of code.
Yes, older systems would not be supported... (in my opinion, if I would allow hardware protection domains, there would be not much point in verifying machine code...)The other issue you have to consider with the "one address space for all processes" method is that it is unrealistic on 32bit machines (the designers of Singularity admit this) as you could conceivably end up sharing 4gb between all your processes and your kernel. Long mode provides much more space, so it is feasible, but you are immediately ruling out a large proportion of the computer systems out there.
This is proof carrying code, right?Typed assembly is just machine code along with a proof that the verifier can cross-check against the code itself.
Oh well, thank you very much for you sharing your opinions with me! I'd really love to discuss a bit more, but I'm very tired at the moment and my head's a real mess now
cheers
Hi,
A quick look on sourceforge revealed that Go! Was superceded by http://sourceforge.net/projects/odin-os/. Looks like that has been taken down now, though.
Cheers,
Adam
A quick look on sourceforge revealed that Go! Was superceded by http://sourceforge.net/projects/odin-os/. Looks like that has been taken down now, though.
Cheers,
Adam
- singularity
- Posts: 5
- Joined: Wed Jul 11, 2007 7:04 am
- Location: Austria
- Contact: