Why kernels map themselves into process address spaces

__matt__ · Post by **__matt__** » Wed Feb 23, 2005 12:00 am

I'm working on memory management code for a project. I came up with (and answered) an interesting question:

The Linux kernel is mapped into each processes' virtual address space, usually starting at 3GB and running to 4GB (this is configurable; some people use a 2GB/2GB split). Why?

The reason is performance. It's posssible to give each process its own full 4GB virtual address space; but when a process makes a system call, the CPU must switch address spaces. Because this is slow (TLB must be flushed, etc), most OS kernels (Linux, Windows, etc) map themselves into each processes' address, avoiding a switch at each system call.

Legend · Post by **Legend** » Wed Feb 23, 2005 12:00 am

And at every IRQ (think about the PIT) ...
And you can't give every programm full 4GB if you don't use hardware task switching (and as a result, can't use task gates in the IDT).
But as a system with a switch on every interrupt would just crawl, I don't think it is not a big loss either ...
Perhaps you might want to make not a monolithic kernel so that you are likely to be able to reduce the amount of virtual memory reserved for the kernel?

rexlunae · Post by **rexlunae** » Thu Feb 24, 2005 12:00 am

__matt__ wrote:The reason is performance. It's posssible to give each process its own full 4GB virtual address space;

You're still only half-right here. Yes, performance is an important reason, but it is also true that, at least on x86, you need at least 4 kernel structures mapped into memory at all times. The GDT and the IDT must be in memory in order to do many things, like catch interrupts. Moreover, in order to do the hardware task switching that would be necessary, you need two TSSes, again, both in memory. So, no, you can't really give each task a 4GB address space.

Brendan · Post by **Brendan** » Thu Feb 24, 2005 12:00 am

Hi,

rexlunae wrote:You're still only half-right here. Yes, performance is an important reason, but it is also true that, at least on x86, you need at least 4 kernel structures mapped into memory at all times. The GDT and the IDT must be in memory in order to do many things, like catch interrupts. Moreover, in order to do the hardware task switching that would be necessary, you need two TSSes, again, both in memory. So, no, you can't really give each task a 4GB address space.

Software task switching has similar problems - basically when you change CR3 you need code that's at the same linear address in both address spaces.

Kinda makes the whole idea of exo-kernels a bit of a joke doesn't it?

Cheers,

Brendan

Marven Lee · Post by **Marven Lee** » Fri Feb 25, 2005 12:00 am

__matt__ wrote: The Linux kernel is mapped into each processes' virtual address space, usually starting at 3GB and running to 4GB (this is configurable; some people use a 2GB/2GB split). Why?

Linux has a 4G/4G patch which gives nearly the full 4GB to user-mode and kernel-mode. The top 16MB of the address space is common to the kernel address space and user address space.

http://lkml.org/lkml/2003/7/8/246

I'm not sure why it needs 16MB, it should take less than 4k of code for this "trampoline" and less than 4k of data for a GDT, IDT, a single TSS and a few extra variables to hold the k_EIP, k_ESP, k_EFLAGS and k_CR3 to use when switching to the kernel.

Say an interrupt occurs from user-mode, it enters the "trampoline kernel". The trampoline then switches to the kernel's CR3 and saves the user-mode EIP, ESP, EFLAGS and CR3 on the k_ESP "kernel personality stack".
It also saves the intitial value of k_ESP itself.

The "trampoline kernel" then does an IRET to begin executing in the "kernel personality address space" by using the k_EIP, k_ESP and k_EFLAGS and k_CR3 held in the kernel.

To return back to user mode another interrupt/syscall is needed to mimic the IRET. It uses the EIP, ESP, EFLAGS and CR3 that were popped onto the "kernel personality stack" to return to the user address space. The value of k_ESP is copied off the stack and stored in the trampoline for the next interrupt/syscall. This allows each task/thread to have a different stack in the "personality address space".

I hope I haven't confused you too much. It gets a bit confusing talking about a "kernel" in an address space "the personality", when you've also got a tiny kernel in every address space "the trampoline."

Interestingly you can think of it as a single-server microkernel a bit like L4Linux, except L4Linux is more complicated and uses an "active object model" which has a thread in the personality simulating the kernel half.

The above 4G-4G trampoline uses a "passive object model", the thread migrates from one address space to another. Many microkernels (such as Mach) have something similar, usually called "Migrating Threads" or LRPC (Lightweight remote procedure call).

The trampoline idea can be expanded with a few modifications to allow a thread to make a "lightweight remote procedure call" to one of many servers.

In alt.os.development KVP mentioned the above trampoline idea and I suggested some way of making it into a multi-server microkernel.

http://tinyurl.com/6gb5b

You might also want to do a search for "Linux: 4G/4G Overhead" or even the paper on L4Linux (since it is similar in some ways) to get an idea of the slowdown.

__matt__ · Post by **__matt__** » Fri Feb 25, 2005 12:00 am

rexlunae wrote:
__matt__ wrote:The reason is performance. It's posssible to give each process its own full 4GB virtual address space;
You're still only half-right here. Yes, performance is an important reason, but it is also true that, at least on x86, you need at least 4 kernel structures mapped into memory at all times. The GDT and the IDT must be in memory in order to do many things, like catch interrupts. Moreover, in order to do the hardware task switching that would be necessary, you need two TSSes, again, both in memory. So, no, you can't really give each task a 4GB address space.

Must the GDT and IDT be mapped into *each task's* address space? Why? Doesn't the CPU switch address spaces when servicing an interrupt?

Brendan · Post by **Brendan** » Fri Feb 25, 2005 12:00 am

Hi,

__matt__ wrote:Must the GDT and IDT be mapped into *each task's* address space? Why? Doesn't the CPU switch address spaces when servicing an interrupt?

Short answer: no you don't have to have the GDT and IDT in every address space (but insanity may be required).

When an IRQ occurs the CPU looks for the IRQ's entry in the IDT. If this IDT entry is an interrupt gate or trap gate it loads CS and EIP from the IDT's entry (where CS comes from the GDT, or possibly an LDT), without switching address spaces.

If the IDT entry is a task gate the CPU will load a task descriptor from the GDT, followed by the TSS it points to (which would require more GDT and/or LDT descriptors to be present, and would/may involve switching address spaces).

Also, when the CPU goes from CPL=3 to CPL=0 it needs to access the GDT (or LDT) and a TSS (the SS0:ESP0 fields).

If you don't want the IDT in every address space you could use polling (ie. don't use IRQs at all). If you don't want the GDT (and at least one TSS) in every address space you can run all code at CPL=0.

Therefore it would be possible to write an OS without mapping the GDT and IDT (and at least one TSS) in each address space, but you'd have huge amounts of interrupt latency and no protection at all.

Cheers,

Brendan

rexlunae · Post by **rexlunae** » Fri Feb 25, 2005 12:00 am

Brendan wrote:Therefore it would be possible to write an OS without mapping the GDT and IDT (and at least one TSS) in each address space, but you'd have huge amounts of interrupt latency and no protection at all.

You would also loose the ability to do preemptive multitasking, as a consequence of not being able to enable interrupts. For most OSes, that alone is a big enough problems to prevent this way of doing things.

OSDev.org

Why kernels map themselves into process address spaces

Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces

Re: Why kernels map themselves into process address spaces