Paging confusion, issues

johnsa · Post by **johnsa** » Thu Apr 02, 2009 3:38 am

Hey all,

Ok i've got a few questions around paging and also a funny issue I'm not quite sure what's causing it.. first off questions:

1) Assuming I have a machine with 2gig of memory installed, let's say running 32bit... Some devices may still map memory into a higher range than is physically available.. IE: video mem buffer at 3gig.. I've noticed this with Vesa LFB. So my assumption is .. that when setting up paging I need to map the full address space (NOT just what is physically available) to allow these mem-mapped devices/areas to be accessable correct?

2) Following on from 1... I assume that no device / bios setup etc can make assumptions as to the OS being 32bit or 64bit... in which case ALL devices and mem-mapped io and config space and so on (for the near future) MUST all be below 4gig to allow a 32bit OS to correctly setup paging? - IE It is safe to assume that over 4gig is 100% free available and the OS could do with as it pleases?

3) To s tart things off in my 64bit kernel I did the following:
boot loader identity maps first 2Mb, just enough to switch to the kernel stub in 64bit long mode so that it can setup full paging and handle re-mapping itself to higher-half VMA etc. I'm fairly happy with that model, so to start off with I've setup the paging in the kernel to be 100% identity mapped, just so i can start testing a few other bits of code/support stuff until I do the higher-half setup.. first issue is related to the questions above, if I have less than 4gig of physical mem, I should still go ahead and MAP the FULL 4gig address space? (to support devices/mem-mapped etc higher than 2gig).. obviously if the machine has more than 4gig physical I just map it all?

4) For some reason the above seems to work perfectly in QEMU, but in bochs it dies and reboots (i assume triple fault). I've narrowed the problem down to the following:
The boot loader maps 2Mb, switches to Kernel stub which maps the full 4gig space.. As a test I try to write a byte to say 3gig and read it back.. in QEMU perfect, in bochs ANY read/write over exactly 2Mb dies.. so it seems as if the reload of CR3 with the updated paging structures isn't taking affect.. here are the various code bits (initial stuff from the higher-half tutorial):
Do i need to perhaps somehow force TLBs to reload? (I didn't think it was needed.. even though I've enabled the Global pages option because none of the pages are yet present/created.. and the 1 and only 2Mb page which has a G flag set doesn't change):

Code: Select all

.. in the boot loader.....

	;--------------------------------------------------------------------------------
	; Basic Identity Map Paging for 1st 2Meg, just enough to get us there.
	;--------------------------------------------------------------------------------
	xor bx,bx
	mov es,bx
	cld
	mov di,0c000h ;c000 = PML4
	mov ax,0d00fh
	stosw
	xor ax,ax
	mov cx,07ffh
	rep stosw
	mov ax,0e00fh ;d000 = PDPT
	stosw
	xor ax,ax
	mov cx,07ffh
	rep stosw
	mov ax,018fh  ;e000 = PDE
	stosw
	xor ax,ax
	mov cx,07ffh
	rep stosw

	;--------------------------------------------------------------------------------
	; Switch To 64bit Long Mode and Jump To Kernel Stub.
	;--------------------------------------------------------------------------------
	mov dl,7								; Display progress.
	mov si,offset ker_str3
	call print_str
	
	; Enable physical-address extensions (PAE)
	mov eax,cr4
	or eax,0a0h								; Enable PAE and PGE bits. 
	mov cr4,eax

	; Point CR3 at PML4
	mov eax,0000c008h 						; Bit 3 set for write-thru caching.
	mov cr3,eax
   
	; Enable long mode (set EFER.LME=1)
	mov ecx,0c0000080h         				; EFER MSR number.
	RDMSR
	or eax,00000101h   						; LME amd SYSCALL/SYSRET.
	WRMSR

	; Enable paging AND protected mode to activate long mode (set CR0.PG=1)
    mov eax,cr0            					; Read CR0
    or eax,80000001h						; Enable Paging and Protection.
    mov cr0,eax            					; Write CR0
	
	dw 0ea66h								; Far jump to 64bit kernel stub, reload CS and
	dd 100000h								; flush instruction cache.
	db 08h,00h




; In the kernel stub code..
	; Configure Paging.
	mov edi,0000e000h
	xor rax,rax
	mov ecx,512*4
	xor rdx,rdx
fillPDE0:
	mov rax,rdx
	or rax,18fh
	mov [edi],rax
	add rdx,(1024*2000)
	add edi,8
	dec ecx
	jnz short fillPDE0
	
	mov edi,0000d000h
	mov eax,0e00fh
	mov [edi],eax
	mov eax,0f00fh
	mov [edi+8],eax
	mov eax,1000fh
	mov [edi+16],eax
	mov eax,1100fh
	mov [edi+24],eax
		
	; Point CR3 at PML4
	mov rax,0000c008h 									; Bit 3 set for write-thru caching.
	mov cr3,rax

	mov edi,(1024*1024)*2              ;DIES in bochs.. works in QEMU ... -1 byte and it works (so its the 2Mb barrier from initial paging setup?)
	mov al,'F'
	mov [edi],al
	xor eax,eax
	mov al,[edi]

The gdt stuff is all correct, stack is setup and working.. kernel is loaded at 0x100000 and linked to work there.. all seems to be right... anyone got any ideas ?

AJ · Post by AJ » Thu Apr 02, 2009 4:09 am

Hi,

johnsa wrote: 1) ...So my assumption is .. that when setting up paging I need to map the full address space (NOT just what is physically available) to allow these mem-mapped devices/areas to be accessable correct?

I wouldn't do it like that, but it depends on design somewhat. Generally with paging, you only page memory in as it is needed. Same goes for devices - you detect them via ACPI/ PCI / Whatever and then page in the memory mapped IO space once the driver is loaded. The answer could also vary depending on whether your drivers are in user or kernel space - if you have user space drivers, you cannot map the memory until you know which process the device belongs to.

in which case ALL devices and mem-mapped io and config space and so on (for the near future) MUST all be below 4gig to allow a 32bit OS to correctly setup paging?

Generally true, I believe. But:

IE It is safe to assume that over 4gig is 100% free available and the OS could do with as it pleases?

Nononono. It's not safe to assume anything about memory layout. Only trust the system memory map. IIRC it was Brendan who's posted a very odd memory map layout in the past from real hardware - again, you can't assume anything.

if I have less than 4gig of physical mem, I should still go ahead and MAP the FULL 4gig address space? (to support devices/mem-mapped etc higher than 2gig).. obviously if the machine has more than 4gig physical I just map it all?

No - page in as you need memory. If you are following a conventional model, each task will have its own memory space - you wouldn't map the entire physical memory space in to each page directory, would you?

I'll look at point 4 if I get a chance later and noone else gets there first

Cheers,
Adam

johnsa · Post by **johnsa** » Thu Apr 02, 2009 5:09 am

Ok, but in the case of 2.. if something reported by e820 WAS above 4gig (for some odd reason).. a 32bit OS that didn't support PAE / PSE for example wouldn't be able use that stuff anyway? - this doesn't really affect me as I'm not even bothering with 32bit.. going straight for 64bit but the principle somehow seems fishy..

my example for the 2gig+ thing was my Vesa LFB... under bochs the VM is configured with 64Mb ram... but the LFB is reported to be at 0xe0000000 (+- 3gig)... same sort of thing happens on real h/w too.
So in this case when the kernel starts up / sets up paging should it know how big that LFB is (say 8Mb) and map that space back down ... or if you went with identity mapping for whatever reason you'd have a big whole in your paging structures.. say 2gig of physical mem.. fully identity mapped, and then possibly a 1 gig whole and then an 8mb slot available again.. in this case the PDEs etc would all still exist and marked as not present anyway.. so it wouldn't make the paging structures any smaller ? I'm opting to use 2Mb pages and use NO 4kb pages under 64bit if possible..
What I'm actually planning is to have paging enabled and initially configured to suit, but i'm not going to do a user/kernel space and i'm not ever going to touch CR3 again.. I was thinking of writing the necessary code that when an app is loaded it rebases the code dynamically depending on where its loaded.. yes it's fiddly, tricky and will take some load time to do it.. but then if you have multiple tasks running the switches will incur no tlb misses and reloads etc.. plus the entire paging structure can be done with 2Mb pages only, keeping the paging structure size +- 32kb or smaller... so its a lot tighter than having possible a couple Megs worth of page entries ... just thinking out loud

JAAman · Post by **JAAman** » Thu Apr 02, 2009 10:48 am

johnsa wrote:Ok, but in the case of 2.. if something reported by e820 WAS above 4gig (for some odd reason).. a 32bit OS that didn't support PAE / PSE for example wouldn't be able use that stuff anyway?

correct -- but all mainstream OSs have been using (or capable of using) PAE/PSE for ~15 years now (or rather i should say the workstation/server editions of them do -- which are the only ones going to be using PCIx or one of the other extended PCI editions which have never (and now never will) be used in more common hardware and can use addresses > 4GB...

my example for the 2gig+ thing was my Vesa LFB... under bochs the VM is configured with 64Mb ram... but the LFB is reported to be at 0xe0000000 (+- 3gig)... same sort of thing happens on real h/w too.

of course -- the video card has no way of knowing how much memory you have, so its intentionally set as high as possible to avoid conflicting with RAM (lots of other hardware does the same thing)

So in this case when the kernel starts up / sets up paging should it know how big that LFB is (say 8Mb) and map that space back down ... or if you went with identity mapping for whatever reason you'd have a big whole in your paging structures.. say 2gig of physical mem.. fully identity mapped, and then possibly a 1 gig whole and then an 8mb slot available again.. in this case the PDEs etc would all still exist and marked as not present anyway.. so it wouldn't make the paging structures any smaller ? I'm opting to use 2Mb pages and use NO 4kb pages under 64bit if possible..

well, first 8MB is awfully small for video memory... but yes, i guess that would be technically correct... but you really dont want the paging structures to be smaller anyway -- since that means you are limited to using only the amount of memory you have as physical RAM (ordinarily you have a lot more memory than that, and ordinarily, most of your 'memory' doesnt even exist in physical RAM (see below)

What I'm actually planning is to have paging enabled and initially configured to suit, but i'm not going to do a user/kernel space and i'm not ever going to touch CR3 again..

well, doing this is possible, but...

I was thinking of writing the necessary code that when an app is loaded it rebases the code dynamically depending on where its loaded.. yes it's fiddly, tricky and will take some load time to do it..

this is actually not the biggest problem:
in that case, you loose almost all the advantages of using paging in the first place...

the first reason to use it, is to protect processes so that they cannot access each others memory (either from bugs, or from intent)

another reason is to dynamically allocate memory without worrying about complexities necessary to deal with memory fragmentation (its all really simple with paging)

another reason is too use more memory than what you have actually installed on the system -- this comes in multiple flavors including virtual-memory paging, and memory mapped files (neither of these very valuable methods are possible under your proposed system) -- under typical usage, most of your memory doesnt need to be in actual RAM most of the time, and is generally located on disk, saving RAM to use for other things -- like more processes, or more data, or disk caching, etc

the only reason you can still use, is to prevent processes from messing with kernel space code and data

but then if you have multiple tasks running the switches will incur no tlb misses and reloads etc..

you will still get TLB misses, and the performance cost of those TLB flushes (on CR3 change) are rather insignificant compared to anything that would be required to replicate the advantages of using paging properly -- besides your kernel space (the part that doesnt change between tasks) doesnt need to be flushed at all

plus the entire paging structure can be done with 2Mb pages only, keeping the paging structure size +- 32kb or smaller... so its a lot tighter than having possible a couple Megs worth of page entries ... just thinking out loud

well, if you are using only 2MB pages, you dont get to use all the TLBs, so there will be even more TLB flushes, and the size of the paging structure is not really an issue when you have a minimum of 4GB of space (for 32bit processes -- its much larger if you are going 64bit)
-- remember, these structures dont have to actually exist in physical RAM (and some parts can be shared by all processes) only the portion that is actually being used needs to exist in physical memory

johnsa · Post by **johnsa** » Thu Apr 02, 2009 11:30 am

Ok, thanks for that info... perhaps someone can go back to my original post/question.. still struggling with it.. as it works perfectly under qemu but not bochs.. seems as if the 2mb limit is still stuck from the initial identity mapped paging setup...

OSDev.org

Paging confusion, issues

Paging confusion, issues

Re: Paging confusion, issues

Re: Paging confusion, issues

Re: Paging confusion, issues

Re: Paging confusion, issues