kernel address space

zebanilinux · Post by **zebanilinux** » Thu Jul 17, 2008 5:20 pm

hi all,

is kernel using direct access to physical memory?

lets say in C we have the code :

void *address = 0xabcd1234;

if we use this address in kernel mode, is it directly accessing to physical address?

bewing · Post by **bewing** » Thu Jul 17, 2008 5:36 pm

My kernel does, but most kernels do not. It depends on the kernel.
Most kernels run in virtual memory. I personally think that is a mistake.

Adek336 · Post by **Adek336** » Thu Jul 17, 2008 6:45 pm

Why do you think so?

01000101 · Post by **01000101** » Thu Jul 17, 2008 7:05 pm

I too use physical memory access.
I will withold my comments on virtual memory.

bewing · Post by **bewing** » Thu Jul 17, 2008 7:16 pm

Adek336 wrote:Why do you think so?

One of the biggest reasons why the modern CPUs run as fast as they do is their caches. When you mess up a cache, the machine slows down a lot. The smaller the cache, the less it takes to mess it up. The smallest cache you have is the TLB. When you switch from usermode to kernelmode to handle a syscall, if the kernel uses virtual memory, you will toast the entire TLB (or at least a significant fraction) in the process of handling the syscall. And it is completely unnecessary -- it is not the tiniest bit difficult to write a kernel that knows how to live inside the restrictions of physical memory. If you turn vmem off immediately "on receipt" of a syscall, then all the usermode cached vmem stuff never gets dumped out of the TLB.

Unfortunately, you don't have a choice on a 64bit system.

proxy · Post by **proxy** » Thu Jul 17, 2008 8:07 pm

bewing wrote:One of the biggest reasons why the modern CPUs run as fast as they do is their caches. When you mess up a cache, the machine slows down a lot. The smaller the cache, the less it takes to mess it up. The smallest cache you have is the TLB. When you switch from usermode to kernelmode to handle a syscall, if the kernel uses virtual memory, you will toast the entire TLB (or at least a significant fraction) in the process of handling the syscall. And it is completely unnecessary -- it is not the tiniest bit difficult to write a kernel that knows how to live inside the restrictions of physical memory. If you turn vmem off immediately "on receipt" of a syscall, then all the usermode cached vmem stuff never gets dumped out of the TLB.

This is why the Global bit exists. Basically the entries marked global will not get flushed on CR3 write. This makes the User->Sys->User transition a lot less expensive.

proxy

Brendan · Post by **Brendan** » Fri Jul 18, 2008 12:24 am

Hi,

bewing wrote:The smallest cache you have is the TLB.

It's small, but (for 4 KB pages) for a typical CPU with 8192 TLB entries those TLB entries cover 32 MB of linear address space, which is much larger than a "little" 2 MB L2 data cache will cover...

bewing wrote:When you switch from usermode to kernelmode to handle a syscall, if the kernel uses virtual memory, you will toast the entire TLB (or at least a significant fraction) in the process of handling the syscall.

No. if you switch from user-mode to kernel-mode then the kernel's code might use several TLB entries (but those TLB entries may have already been present in the TLB). If the kernel accesses many MB of data (which IMHO is extremely unlikely with a sane kernel) it would end up getting rid of all the "least recently used" user-mode TLB entries; or, if the kernel does a task switch the user-mode TLB entries will be flushed.

However, if the kernel disables paging all TLB entries will be flushed (including entries for "global" pages), regardless of whether or not a task switch is done. This would cause far worse performance problems than leaving paging enabled - a simple/fast kernel API function would cause a huge number of TLB misses to occur after it returns to user-mode.

bewing wrote:And it is completely unnecessary -- it is not the tiniest bit difficult to write a kernel that knows how to live inside the restrictions of physical memory.

For a toy kernel like Linux, I agree. When you start doing NUMA optimizations (or trying to do fault tolerance for faulty RAM) you can't assume that any specific area in the physical address space will be suitable, and something common like "the kernel's code starts at 0x00100000 in the physical address space" becomes far too restrictive.

bewing wrote:If you turn vmem off immediately "on receipt" of a syscall, then all the usermode cached vmem stuff never gets dumped out of the TLB.

Hehe - from Intel's manual, section 10.9. Invalidating the Translation Lookaside Buffers (TLBs):
"The following operations invalidate all TLB entries, irrespective of the setting of the G flag:
* Asserting or de-asserting the FLUSH# pin.
* (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to an MTRR (with a WRMSR instruction).
* Writing to control register CR0 to modify the PG or PE flag.
* (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to control register CR4 to modify the PSE, PGE or PAE flag."

If you turn vmem off immediately "on receipt" of a syscall (which involves writing to control register CR0 to modify the PG flag), then you'll be completely flushing all TLB entries for every syscall.

proxy wrote:This is why the Global bit exists. Basically the entries marked global will not get flushed on CR3 write. This makes the User->Sys->User transition a lot less expensive.

Um, no - the global bit makes address space switches less expensive (e.g. process switches), not privilege level switches (CPL=3 -> CPL=0 -> CPL=3).

Cheers,

Brendan

bewing · Post by **bewing** » Fri Jul 18, 2008 1:52 am

Brendan wrote: Hehe - from Intel's manual, section 10.9. Invalidating the Translation Lookaside Buffers (TLBs):

Huh. Well, that's incredibly stupid of Intel. I hadn't noticed that detail. *sigh*

Brendan · Post by **Brendan** » Fri Jul 18, 2008 2:22 am

Hi,

bewing wrote:
Brendan wrote:Hehe - from Intel's manual, section 10.9. Invalidating the Translation Lookaside Buffers (TLBs):
Huh. Well, that's incredibly stupid of Intel. I hadn't noticed that detail. *sigh*

Sorry - I hope this doesn't ruin your kernel's design...

Cheers,

Brendan

OSDev.org

kernel address space

kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space

Re: kernel address space