OSDev.org

Posted: **Wed Dec 07, 2005 9:46 am**

Hi, folks!

Recently I decided to rewrite my hobby kernel from scratch. This time the kernel runs at 0xC0100000 address. I used the GDT trick solution explained in the OSFAQ.

By the way, I've seen that there is no sample code to show how to setup a kernel using this solution. :-[ So I wrote a very minimal but also very well commented and easy to understand kernel for the newbies. I called it HigherHalfWithGDT and attached it in this message.

I invite you to download it and tell me if you like it. I would be happy if you decided to include my sample kernel into the OSFAQ...

I hope you'll accept my request.

Posted: **Wed Dec 07, 2005 9:57 am**

falconfx wrote: I invite you to download it and tell me if you like it. I would be happy if you decided to include my sample kernel into the OSFAQ...

You can add it yourself, the OSFAQ is open for addition by everybody.

Posted: **Wed Dec 07, 2005 2:43 pm**

Nice,

But however: the location of the kernel in memory is of NO real importance.

I could be in lowhalf, high half, high low half :p and so on....
It does not really matter where the kernel is located.

Use a location that best suites you personal taste

Posted: **Wed Dec 07, 2005 3:57 pm**

Well actually it does matter, it's usually a good idea to have it mapped high up (or at least it seems that way after I asked and got given a million reasons

)

Posted: **Wed Dec 07, 2005 4:12 pm**

There are a few of those "million" at http://www.osdev.org/osfaq2/index.php/HigherHalfKernel.

Posted: **Wed Dec 07, 2005 4:59 pm**

bogdanontanu wrote: I could be in lowhalf, high half, high low half :p and so on....

You have three halves? Could you do that trick on my computers memory? Repeatedly?

Posted: **Wed Dec 07, 2005 5:42 pm**

I say clarification (through example code is one way) is a good thing. Also the original post did not have the sample attached?

Posted: **Wed Dec 07, 2005 10:29 pm**

None of the reasons listed in the OS-FAQ for using a "higher half" kernel seem overly compelling (or even relevant) to me. I'd still be inclined to agree with bogdanontanu and say that it really doesn't matter.

1. Easier to set up VM86 processes since the region below 1MB is userspace.

In which way is this supposed to make VM86 easier?

2. More generically, user applications are not dependent of how many memory is kernel space (your app can be linked for 0x400000 regardless of whether kernel is at 0xC0000000, 0x80000000 or 0xE0000000 ...), which makes ABIs nicer.

That's not really true, in my opinion. Applications or no more or no less dependant on kernel memory space whether its at the top, or at the bottom.

Simply put, if the kernel's at the top, apps can't be in the top. If it's at the bottom, apps can't be in the bottom. These restrictions extend to both systems. Apps should be position independent, anyway, imo.

3. If your OS also supports 64-bits, 32-bit applications will be able to use the full 32-bit address space in the 64-bit version.

This seems reasonable... except that if a 32-bit app needs to access any of the kernel it will need to be mapped into its address space, and therefore will take away from the full 32-bit address space just like before (yes, yes, callgates and other slow mechanisms can get rid of this, but...). Sure, it's possible to shift a lot outside of the 32-bit address space, but feasibly, I don't think you can get rid of it all.

Ideally these applications can simply be recompiled to take full advantage of a 64-bit address space anyway (I realize that's a cheezy response, but it's true...)

4. 'mnemonic' invalid pointers such as 0xcafebabe, 0xdeadbeef, 0xdeadc0de, etc. can be used.

Oh well... I can use NULL. 0xdeadbeef is not an advantage...

--Jeff

Posted: **Thu Dec 08, 2005 12:17 am**

Hi,

carbonBased wrote:In which way is this supposed to make VM86 easier?

Because virtual 8086 tasks must use the lowest 1 MB of linear memory, so if you plan on having several "DOS boxes" or something you'll be screwed if the lowest 1 MB is mapped the same into every address space (e.g. treated the same as kernel space). There are ways around this, but in general leaving the lowest N MB for user applications would make it easier to allow any user-level processes to contain virtual 8086 code.

Some people only use virtual 8086 for "thunking" (a way to run crusty BIOS functions). In this case it could be easier to build the virtual 8086 code into the kernel (and therefore put the kernel at the beginning of address spaces). IMHO the only sane excuse for this sort of hack is video code (VBE functions).

2. More generically, user applications are not dependent of how many memory is kernel space (your app can be linked for 0x400000 regardless of whether kernel is at 0xC0000000, 0x80000000 or 0xE0000000 ...), which makes ABIs nicer.

carbonBased wrote:That's not really true, in my opinion. Applications or no more or no less dependant on kernel memory space whether its at the top, or at the bottom.

Simply put, if the kernel's at the top, apps can't be in the top. If it's at the bottom, apps can't be in the bottom. These restrictions extend to both systems.

So if the kernel is from 0x00000000 to 0x3FFFFFFFF you'd have applications code starting at 0x40000000 with application data above the application's code. Then, if the kernel happens to be from 0x00000000 to 0x1FFFFFFFF would you have the application at 0x40000000 with the application's data above this and wasted space from 0x20000000 to 0x3FFFFFFFF, or would you have a fragmented system with data above the applications code and below it? How about if the kernel grew and used from 0x00000000 to 0x7FFFFFFFF - would you still be able to run applications that are designed to be at 0x40000000?

carbonBased wrote:Apps should be position independent, anyway, imo.

IMHO the 80x86 CPU is not designed for 32 bit position independant code (at least not without using segmentation), and therefore all attempts at making it run 32 bit position independant code are ugly hacks. I don't like ugly hacks, and I don't agree that all applications should be ugly hacks (even if your compiler hides these hacks, they still exist).

carbonBased wrote:3. If your OS also supports 64-bits, 32-bit applications will be able to use the full 32-bit address space in the 64-bit version.

This seems reasonable... except that if a 32-bit app needs to access any of the kernel it will need to be mapped into its address space, and therefore will take away from the full 32-bit address space just like before (yes, yes, callgates and other slow mechanisms can get rid of this, but...). Sure, it's possible to shift a lot outside of the 32-bit address space, but feasibly, I don't think you can get rid of it all.

If the OS is designed well it's easy to remove all of the kernel from the 32 bit part of the address space. Problems come from a badly designed ABI - for e.g. allowing (or expecting) applications to access kernel data directly would cause problems (but then you've got problems with any design change that effect the structure or location of this data, and the kernel can't safely modify the data in a multi-CPU environment).

Let's consider this from the opposite perspective - is there any benefit from having the kernel in the lower part of the address space?

The only benefit I can think of is that it makes it easy for people to pretend they are writing an OS (e.g. slapping a "hello world" kernel on top of GRUB and then having a party to celebrate). Maybe I've missed something...

Cheers,

Brendan

Posted: **Thu Dec 08, 2005 12:29 am**

An additional one would be that if you have your kernel mapped high and then it gets bigger then you'd get the following type of scenario:

[tt]|-----|
| OS |
|-----|
| |
| |
| |
|-----|
| App |
|-----|[/tt]
changing to
[tt]|-----|
| |
| OS |
| |
|-----|
| |
|-----|
| App |
|-----|[/tt]

Note that the app hasn't changed position, while the OS calls are handled by whatever mechanism you choose and this will know the locations of the code in the kernel for the new call anyway.

On the other hand, if the kernel is mapped below the apps and gets bigger then it might cross the address that apps use as a base address, in which case you'll have problems staying compatible. The only time the upper-half kernel would encounter this type of problem is if it got big enough that there wasn't enough space left for the app at all, in which case you have issues anyway.

Posted: **Thu Dec 08, 2005 5:39 am**

Brendan wrote:
carbonBased wrote:Apps should be position independent, anyway, imo.
IMHO the 80x86 CPU is not designed for 32 bit position independant code (at least not without using segmentation), and therefore all attempts at making it run 32 bit position independant code are ugly hacks. I don't like ugly hacks, and I don't agree that all applications should be ugly hacks (even if your compiler hides these hacks, they still exist).

"Position independent apps" does not neccessarily mean real position independent code - relocatable binaries do the job just as well. I mean, the kernel space's size and base are unlikely to change at runtime, are they?

The NT kernel uses a similar trick, for example: It can be loaded with a special cmdline-switch (IIRC it was something like '/3GB' ), then the kernel image is relocated to 0xC0000000 instead of the default base of 0x80000000. No position independent code at runtime needed either (OK, in this case the kernel is relocated, not the app...)

cheers Joe

Posted: **Thu Dec 08, 2005 5:45 am**

Hi,

JoeKayzA wrote:"Position independent apps" does not neccessarily mean real position independent code - relocatable binaries do the job just as well. I mean, the kernel space's size and base are unlikely to change at runtime, are they?

Ok, so what is the difference between "position independent code" and "relocatable binaries"?

JoeKayzA wrote:The NT kernel uses a similar trick, for example: It can be loaded with a special cmdline-switch (IIRC it was something like '/3GB' ), then the kernel image is relocated to 0xC0000000 instead of the default base of 0x80000000. No position independent code at runtime needed either (OK, in this case the kernel is relocated, not the app...)

The NT kernel probably sets CS base to <whatever> and possibly even uses a CS segment override prefix to access kernel data. How could this be done for applications (without segmentation or position independant code)?

Cheers,

Brendan

Posted: **Thu Dec 08, 2005 5:58 am**

Brendan wrote: Ok, so what is the difference between "position independent code" and "relocatable binaries"?

Note that I've not yet written a binary loader or dynamic linker, but I always saw a distinction between these two:

In a relocatable binary, you fix up all instructions which use absolute adresses (global data, absolute jumps/calls...) at load time. So you need a table that points to all instructions in a binary which need to be fixed. The _resulting_ code is no more position independent, but it is able to run at your chosen base address.

Real position independent code (PIC) is built in a way that it references all global data through a table (in ELF it was called the GOT, IIRC), the code itself never uses absolute addresses then. To relocate the code, you just need to fix the table entries, the code itself can remain unchanged, but it is less efficient at runtime due to indirect references and the like. The advantage is that the code can be mapped into multiple address spaces simultaneously, even at different base address, and it still remains runable.

I hope I didn't get this entirely wrong...

cheers Joe

Posted: **Thu Dec 08, 2005 6:07 am**

JoeKayzA wrote: Note that I've not yet written a binary loader or dynamic linker, but I always saw a distinction between these two:

In a relocatable binary, you fix up all instructions which use absolute adresses (global data, absolute jumps/calls...) at load time. So you need a table that points to all instructions in a binary which need to be fixed. The _resulting_ code is no more position independent, but it is able to run at your chosen base address.

Real position independent code (PIC) is built in a way that it references all global data through a table (in ELF it was called the GOT, IIRC), the code itself never uses absolute addresses then. To relocate the code, you just need to fix the table entries, the code itself can remain unchanged, but it is less efficient at runtime due to indirect references and the like. The advantage is that the code can be mapped into multiple address spaces simultaneously, even at different base address, and it still remains runable.

I hope I didn't get this entirely wrong...

What you say is all entirely true, IE, I confirm that that's what I know as truth too. However, in ELF the PLT is also used to link procedures at runtime. Aside from that slight twist, it's 100% accurate.

Posted: **Thu Dec 08, 2005 6:59 am**

Hi,

JoeKayzA wrote:In a relocatable binary, you fix up all instructions which use absolute adresses (global data, absolute jumps/calls...) at load time. So you need a table that points to all instructions in a binary which need to be fixed. The _resulting_ code is no more position independent, but it is able to run at your chosen base address.

Real position independent code (PIC) is built in a way that it references all global data through a table (in ELF it was called the GOT, IIRC), the code itself never uses absolute addresses then. To relocate the code, you just need to fix the table entries, the code itself can remain unchanged, but it is less efficient at runtime due to indirect references and the like. The advantage is that the code can be mapped into multiple address spaces simultaneously, even at different base address, and it still remains runable.

This makes sense, and you're right - they are different.

The problem with the second method (PIC) is that every call or jump becomes a slower indirect call/jump, and that the function table (GOT) wastes cache space and the instruction fetch logic in the CPU (branch prediction, instruction pre-fetch, etc) may not be able to handle it as efficiently.

The first method (relocatable binary) can be worse. For normal executables (and PIC) you can memory map the file to avoid loading all of it, and then recycle the same physical pages if it's started more than once (i.e. map the same pages into several address spaces and use "copy on write"). If you're running out of RAM you can find an unmodified page and free it (without sending it to swap first) and then load it again from the original executable file on disk.

For relocatable binaries you have to use swap space instead of the original executable file because the binary was modified after it was loaded. If the executable was at the same address in all address spaces you could still recycle the same physical pages, and I guess you could avoid loading all of it before the code is run (but it wouldn't be so easy - foro e.g. you'd need to take care of instructions like a "jmp somewhere" that cross page boundaries).

I still think it's easier (and better for performance) to just use "fixed location" binaries and put the kernel at the top of the address space.

Cheers,

Brendan

OSDev.org

I wrote a simple HigherHalf kernel...

I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...

Re:I wrote a simple HigherHalf kernel...