OSDev.org

kohlrak · Post by **kohlrak** » Tue Mar 05, 2013 4:13 pm

I'm trying to avoid v86, but it's apparent that if i want over 1MB available for use (later i want to try to implement some sort of audio driver, and loading raw audio files for that driver will probably require more than 1MB for anything longer than 5 seconds, also i want to allow importation of raw RGB16 images as well), i'm going to need to implement virtual86 mode (turns out EDD's dword loading pointer is seg:off). The problem is, alot of the information out there already makes assumptions that I'm not only already doing certain things (already implementing task switching, etc), but that I plan to implement certain other things as well (enable page mode, etc). So, I am going to state my current kernel and the basic plans for it.

Current State:
0: My kernel gets loaded to 0x7E0:0x0000 by a bootloader that does nothing more than use EDD bios functions to call the usb drive (emulated hard drive).
1: A20 Enabled
2: GDT with 2 entries (code and data) with access to all possible 32bit memory is loaded
3: VBE mode set
4: jumped to pmode
5: IDT is loaded with custom interrupts
6: pic is remapped
7: write-back enabled in MTRR for all addresses except 0xC0000000 and up, which have write-combine enabled
8: A few ISRs are implmented for basic graphics functions (double buffering, char-mapped images, etc)
9: Some testing code to prove everything i have so far works
A: Entirely in x86 FASM
B: RAM locations marked "conventional" on the wiki's memory map are the only adultered locations (except wherever the LFB is hiding)

Plans:
-Primarily single threaded application (other threads will exist, but they are merely to handle keyboard, clock, etc and will preserve regs with pusha and popa)
-Both graphical and "console" modes
-Load and save functions based on disk sectors (LBA format)
-int 0x31 (kernel use only intended) will be given a pointer to copy a 16bit block of code to address 0x0070:0x0000 to be run by v86
-a "user program" (still ring 0) will be loaded into high memory then executed
-overwritable exception handlers (default is green screen of death and capslock light turned on)
-all pentium 4 instruction sets enabled (FPU, etc: I'm assuming my userbase is using at least a pentium 4 [probably later considering HD emulation USB is required to boot])

Goal:
I want to keep this kernel as simple as possible. This helps me (and others reading the source) learn and understand with minimal hair-ripping. This kernel is meant to be a simple learning tool to learn the bare basics from the ground up (the "user program" will be basic beginner lessons for learning x86 assembly assuming it to be the first programming language, but as lessons continue, it will eventually move into the kernel's source code [mostly for information, not really for writing a whole tutorial on OS development]).

I'm not too worried about the speed of the thing, so if hardware task switching (which i know nothing about at this point, aside from this mystical structure called a TSS which i don't fully understand) is simpler and gets us to 16bit real mode and back, I'm all for it. I assume that I will have to remap the IRQs back to ISR0 and up, but i'm not sure if i should keep Automatic EOI on and stuff like that or not. I just want to get to real mode and back to have the drive read, written to, video switch, etc without rewriting my kernel from scratch and going bald.

So how do i get that int 0x31 as painlessly as possible? (I can do the memcopy easily, obviously, i just need a way to get to real mode and back.)

Brendan · Post by **Brendan** » Tue Mar 05, 2013 10:53 pm

Hi,

kohlrak wrote:7: write-back enabled in MTRR for all addresses except 0xC0000000 and up, which have write-combine enabled

That's a bad idea. The MTRRs should reflect what should/shouldn't be cached (and how), and therefore depends on the specific system's physical address space map. Making everything above 0xC0000000 "write-combine" will break the local APIC and most PCI devices that use memory mapped IO (ethernet, sound, USB controllers, etc). Making everything below 0xC000000 "write-back" (even things that are not RAM) can also cause things to break.

Basically there's only 4 sane choices:

don't touch MTRRs at all (easiest)
make video display memory (and nothing else) "write-combining" without changing any other MTRRs
make video display memory (and nothing else) "write-combining", including changing other MTRRs without increasing the caching any specific area (e.g. if something is "write-through" then you can make it "uncached" or "write-combining" but can't make it "write-back" without risk)
get a full/complete physical memory map and setup the MTRRs according to what each area listed in the physical memory map says. Note: The information returned by "int 0x15, eax=0xE820" is complete enough to avoid risk, but isn't enough for best performance as you don't know which "system" areas can be cached and have to assume "uncached" to be safe.

kohlrak wrote:So how do i get that int 0x31 as painlessly as possible? (I can do the memcopy easily, obviously, i just need a way to get to real mode and back.)

To me it looks like you only really need one function that switches to protected mode, copies data, then switches back to real mode. That will allow you to continue with your current code, access more memory (by copying it to/from the 1 MiB that real mode can access) and continue with your current goals.

Cheers,

Brendan

kohlrak · Post by **kohlrak** » Wed Mar 06, 2013 12:06 am

Brendan wrote:Hi,

kohlrak wrote:7: write-back enabled in MTRR for all addresses except 0xC0000000 and up, which have write-combine enabled
That's a bad idea. The MTRRs should reflect what should/shouldn't be cached (and how), and therefore depends on the specific system's physical address space map. Making everything above 0xC0000000 "write-combine" will break the local APIC and most PCI devices that use memory mapped IO (ethernet, sound, USB controllers, etc). Making everything below 0xC000000 "write-back" (even things that are not RAM) can also cause things to break.

Only a major concern if i'm actually using that hardware (though i admit that i am using some of it, i'm using bios for everything other than vesa right now, but i figure i will have to mess with that later). The problem i'm having with treating it properly is the mask. How do i make a mask on the fly that covers the whole linear frame buffer without including some other sections when the address is somewhere in 0xC0000000? It'd be easy if i just pointed to the end instead of using that mask thing, but unfortunately i have to use the mask register to state the end. Though if you know some tricks, please tell me, because it was my best solution that i've come up with after spending a day or two on it.

Basically there's only 4 sane choices:
don't touch MTRRs at all (easiest)

But i do like the speed boost i get from it, because flipping to that page will be something my kernel does very, very often.

[*]make video display memory (and nothing else) "write-combining" without changing any other MTRRs

Which is where my problem is.

[*]make video display memory (and nothing else) "write-combining", including changing other MTRRs without increasing the caching any specific area (e.g. if something is "write-through" then you can make it "uncached" or "write-combining" but can't make it "write-back" without risk)

My problem is I don't know how to mask the whole LFB area without also ending up masking parts of other areas. Even the AMD manual states that the size should be a power of 2, and for good reason: if it's not, it's not going to work very well.

[*]get a full/complete physical memory map and setup the MTRRs according to what each area listed in the physical memory map says. Note: The information returned by "int 0x15, eax=0xE820" is complete enough to avoid risk, but isn't enough for best performance as you don't know which "system" areas can be cached and have to assume "uncached" to be safe.[/list]

Right.

There's a 5th solution, too. Disable it and enable it when i need to. Probably not an optimal solution, but it'll still probably be faster than not using it at all. However, until i can link it directly to some problems i'm having, it's probably OK to leave it on. I'm using bios to mess with hard drive, and i really won't need USB or anything like that (even sound is a feature i can do without for this kernel). I actually don't want my kernel to do much outside of emulating a simple environment from which to learn. Actually, i intend for it to also be written in such a way that it can't be used for all the extra things (i want it to remain a learning tool and not turn into somebody's production OS).

kohlrak wrote:So how do i get that int 0x31 as painlessly as possible? (I can do the memcopy easily, obviously, i just need a way to get to real mode and back.)
To me it looks like you only really need one function that switches to protected mode, copies data, then switches back to real mode. That will allow you to continue with your current code, access more memory (by copying it to/from the 1 MiB that real mode can access) and continue with your current goals.

Cheers,

Brendan

Pretty much. That's what int 0x31 will do. My problem is that i don't know how to get back to real mode. It's like, once you go to pmode, you can't just simply go back by restoring your segment registers (cs via retf) and clearing bit 0 in cr0.

Brendan · Post by **Brendan** » Wed Mar 06, 2013 12:38 am

Hi,

kohlrak wrote:
Brendan wrote:
kohlrak wrote:7: write-back enabled in MTRR for all addresses except 0xC0000000 and up, which have write-combine enabled
That's a bad idea. The MTRRs should reflect what should/shouldn't be cached (and how), and therefore depends on the specific system's physical address space map. Making everything above 0xC0000000 "write-combine" will break the local APIC and most PCI devices that use memory mapped IO (ethernet, sound, USB controllers, etc). Making everything below 0xC000000 "write-back" (even things that are not RAM) can also cause things to break.
Only a major concern if i'm actually using that hardware (though i admit that i am using some of it, i'm using bios for everything other than vesa right now, but i figure i will have to mess with that later).

It's a concern if the area/s are being used, including if those areas are being used by the firmware's SMM without the OS knowing or being able to prevent it (including things like firmware/SMM using HPET to emulate PIT, etc).

kohlrak wrote:The problem i'm having with treating it properly is the mask. How do i make a mask on the fly that covers the whole linear frame buffer without including some other sections when the address is somewhere in 0xC0000000? It'd be easy if i just pointed to the end instead of using that mask thing, but unfortunately i have to use the mask register to state the end. Though if you know some tricks, please tell me, because it was my best solution that i've come up with after spending a day or two on it.

Fortunately PCI specs say that memory mapped IO areas must have a "power of 2" size that is greater than or equal to 4 KiB, and must start on a "natural boundary". E.g. it could be 4 KiB on a 4 KiB boundary, or 8 Kib on an 8 KiB boundary, or 512 MiB on a 512 MiB boundary, or... This makes it easy to setup a "variable MTRR register", as the MTRR requires things that PCI specs guarantee. Basically, you'd just set the "MTRR_PHYSBASE" MSR to the start of the region and the "MTRR_PHYSMASK" MSR to "~(area_size - 1)".

Unfortunately, VBE is not PCI, and the "PhysBasePtr" in VBE's "mode information" may not be the start of video display memory, and "LinBytesPerScanLine" field multiplied by the vertical resolution is unlikely to be the total size of video display memory. You could do some "conservative estimation" based on the information provided by VBE though (e.g. round the starting address up and the size down until size is a power of 2 and starting address is on a natural boundary). This may not be as good as getting the real info from PCI configuration space, but it's probably much easier if you don't have code to enumerate PCI devices yet.

kohlrak wrote:
Brendan wrote:To me it looks like you only really need one function that switches to protected mode, copies data, then switches back to real mode. That will allow you to continue with your current code, access more memory (by copying it to/from the 1 MiB that real mode can access) and continue with your current goals.
Pretty much. That's what int 0x31 will do. My problem is that i don't know how to get back to real mode. It's like, once you go to pmode, you can't just simply go back by restoring your segment registers (cs via retf) and clearing bit 0 in cr0.

Once you go to protected mode, you can simply go back to real mode by loading "real mode compatible" segment registers, clearing the bit in CR0, then loading actual real mode segment registers. However, this only works if you haven't modified the state of any hardware that the BIOS relies on (e.g. don't disable PIC chips and enable IO APIC and expect BIOS to work with IO APIC or something); and you can't spend too long in protected mode without worrying about the BIOS losing IRQs. For a simple "copy memory" function that will never copy more than 1 MiB, none of these things matter (you won't be messing with hardware and won't be in protected mode for too long) and it should be relatively easy.

Cheers,

Brendan

kohlrak · Post by **kohlrak** » Wed Mar 06, 2013 1:23 am

Brendan wrote:Hi,

kohlrak wrote:
Brendan wrote:
That's a bad idea. The MTRRs should reflect what should/shouldn't be cached (and how), and therefore depends on the specific system's physical address space map. Making everything above 0xC0000000 "write-combine" will break the local APIC and most PCI devices that use memory mapped IO (ethernet, sound, USB controllers, etc). Making everything below 0xC000000 "write-back" (even things that are not RAM) can also cause things to break.
Only a major concern if i'm actually using that hardware (though i admit that i am using some of it, i'm using bios for everything other than vesa right now, but i figure i will have to mess with that later).
It's a concern if the area/s are being used, including if those areas are being used by the firmware's SMM without the OS knowing or being able to prevent it (including things like firmware/SMM using HPET to emulate PIT, etc).

True...

kohlrak wrote:The problem i'm having with treating it properly is the mask. How do i make a mask on the fly that covers the whole linear frame buffer without including some other sections when the address is somewhere in 0xC0000000? It'd be easy if i just pointed to the end instead of using that mask thing, but unfortunately i have to use the mask register to state the end. Though if you know some tricks, please tell me, because it was my best solution that i've come up with after spending a day or two on it.
Fortunately PCI specs say that memory mapped IO areas must have a "power of 2" size that is greater than or equal to 4 KiB, and must start on a "natural boundary". E.g. it could be 4 KiB on a 4 KiB boundary, or 8 Kib on an 8 KiB boundary, or 512 MiB on a 512 MiB boundary, or... This makes it easy to setup a "variable MTRR register", as the MTRR requires things that PCI specs guarantee. Basically, you'd just set the "MTRR_PHYSBASE" MSR to the start of the region and the "MTRR_PHYSMASK" MSR to "~(area_size - 1)".

I could've sworn that formula didn't always work (especially with sizes that were smaller than the address), but apparently either my memory is wrong or i suck at math. At any rate, i guess i'll have to yield to that one. I can't come up with a proof of concept problem with that.

Unfortunately, VBE is not PCI, and the "PhysBasePtr" in VBE's "mode information" may not be the start of video display memory, and "LinBytesPerScanLine" field multiplied by the vertical resolution is unlikely to be the total size of video display memory. You could do some "conservative estimation" based on the information provided by VBE though (e.g. round the starting address up and the size down until size is a power of 2 and starting address is on a natural boundary). This may not be as good as getting the real info from PCI configuration space, but it's probably much easier if you don't have code to enumerate PCI devices yet.

Well, it has to be at least 4MB. Fortunately, from what i have seen, it usually likes to start at a nice address. IIRC, intel even stated that write-combine is meant for things like LFBs, so i would assume companies would make that in mind. Though i probably should test (actually, my MTRR code already does: just AND it).

kohlrak wrote:
Brendan wrote:To me it looks like you only really need one function that switches to protected mode, copies data, then switches back to real mode. That will allow you to continue with your current code, access more memory (by copying it to/from the 1 MiB that real mode can access) and continue with your current goals.
Pretty much. That's what int 0x31 will do. My problem is that i don't know how to get back to real mode. It's like, once you go to pmode, you can't just simply go back by restoring your segment registers (cs via retf) and clearing bit 0 in cr0.
Once you go to protected mode, you can simply go back to real mode by loading "real mode compatible" segment registers, clearing the bit in CR0, then loading actual real mode segment registers. However, this only works if you haven't modified the state of any hardware that the BIOS relies on (e.g. don't disable PIC chips and enable IO APIC and expect BIOS to work with IO APIC or something);

I even tried a barebones example below, but i'm a little curious what you mean by "read mode compatible segment registers."

and you can't spend too long in protected mode without worrying about the BIOS losing IRQs. For a simple "copy memory" function that will never copy more than 1 MiB, none of these things matter (you won't be messing with hardware and won't be in protected mode for too long) and it should be relatively easy.

Cheers,

Brendan

Shouldn't the BIOS poll the hardware for what it needs before returning to the realmode application? If not, i could always use the vrefresh polling i do as a pretty reliable wait. As for after that, what could the BIOS really need (especially when i only need it to change video modes and load and save files)? I plan on spending most of my time in pmode and just swithing back to real mode to do some minor tasks like that.

Anyway, here's my most recent attempt at tackling this:

Code: Select all

;Note: There's alot of nothing happening here. In real life I'd have alot more to fight with.

org 0x7c00
use16
	xor ax, ax
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax
	mov sp, 0x7c00

	sgdt [rmgdt]
	sidt [rmidt]

	;Let's pretend there's a bootloader here, but we don't need one because this is tiny

	cli
	lgdt [GDTR]
	mov eax, cr0
	or al, 1
	mov cr0, eax
	jmp 0x08:@f
use32
@@:	mov ax, 0x10
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax
	mov esp, 0x7c00
	;Notice we don't turn on interrupts or load the IDT, here

	;Pretend there's some code here, but actually we're not doing anything here in this demo...

	;Oh no! I did all this work, and I don't have a driver to print to the screen!
	;I need to return to real mode to use a bios interrupt!

	call doRealMode

	;The end, or is it?
@@:	hlt
	jmp @b ;just in case
;===============================================================================================================
rmgdt df 0
rmidt df 0
doRealMode:
	enter 1024, 0
	pushw 0
	pushw @f
	mov eax, cr0
	and al, 0xFE
	mov cr0, eax
	lidt [rmidt]	
	lgdt [rmgdt]
	retf
@@:	xor ax, ax
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax
	sti

	call printStuff
	cli
	lgdt [GDTR]
	mov eax, cr0
	or al, 1
	mov cr0, eax
	jmp 0x08:@f
use32
@@:	mov ax, 0x10
	mov ds, ax
	mov es, ax
	mov fs, ax
	mov gs, ax
	mov ss, ax
	leave
	ret
;===============================================================================================================
use16
some_string db "Hello world!", 0
printStuff:
	mov ah,0Fh
	int 10h
	mov si,some_string
	mov ah,0Eh
@@:	lodsb
	or al, al
	jz @f
	int 10h
	jmp @b
@@:	ret
;===============================================================================================================
gdt:
gdt_null db 0x00, 0x00, 0x00, 0x00, 0x00, 0x90, 0x00, 0x00
gdt_code db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x98, 0xCF, 0x00
gdt_data db 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x92, 0xCF, 0x00
gdt_end:
GDTR:	dw gdt_end - gdt - 1
	dd gdt
times (0x7dfe - $) db 0
	dw 0xAA55

I figure that if I can get that to work, I can reverse the damage of everything else i'm using (i want to keep to the standard pic for this).

Though, on a side note, this arch is getting rather old to me. Aside from a learner OS or a recovery setup, there's very little use I could get out of anything i do for it, because there's so much varying hardware. For real development I'm better off learning ARM and developing on boards where i know the hardware will be consistent (plus it's cheaper so i don't have to worry about frying it when messing with external electronics).

Brendan · Post by **Brendan** » Wed Mar 06, 2013 2:44 am

Hi,

kohlrak wrote:
Brendan wrote:Fortunately PCI specs say that memory mapped IO areas must have a "power of 2" size that is greater than or equal to 4 KiB, and must start on a "natural boundary". E.g. it could be 4 KiB on a 4 KiB boundary, or 8 Kib on an 8 KiB boundary, or 512 MiB on a 512 MiB boundary, or... This makes it easy to setup a "variable MTRR register", as the MTRR requires things that PCI specs guarantee. Basically, you'd just set the "MTRR_PHYSBASE" MSR to the start of the region and the "MTRR_PHYSMASK" MSR to "~(area_size - 1)".
I could've sworn that formula didn't always work (especially with sizes that were smaller than the address), but apparently either my memory is wrong or i suck at math. At any rate, i guess i'll have to yield to that one. I can't come up with a proof of concept problem with that.

The formula should work, but only when size is a power of 2 and the start address is naturally aligned. If the size isn't a power of 2 and/or the start address isn't naturally aligned; then you have to either find the largest piece that is (and forget the remainder) or use multiple variable MTRRs (in an additive or subtractive way).

For example, if you want 3 MiB starting at 0xE00000000 as "write combining", then the choices would be:

pick the biggest area (2 MiB at 0xE0000000)
use multiple areas additively (2 MiB at 0xE0000000 as write-combining plus another 1 MiB at 0xE0200000 as write-combining)
use multiple areas subtractively (4 MiB at 0xE0000000 as write-combining that is too big with another 1 MiB at 0xE0300000 as uncached to cancel out the "too big" part of the first area)

For another example, if you want 2 MiB starting at 0xE00100000 as "write combining", then the choices would be:

pick the biggest area (1 MiB at 0xE0010000 - don't forget that 2 MiB wouldn't be naturally aligned)
use multiple areas additively (1 MiB at 0xE0010000 as write-combining plus another 1 MiB at 0xE0200000 as write-combining)
use multiple areas subtractively (4 MiB at 0xE0000000 as write-combining that is too big, another 1 MiB at 0xE0300000 as uncached to cancel out the end of the first area and another 1 MiB at 0xE0000000 to cancel out the start of the first area)

Due to PCI spec requirements, none of this is normally necessary for PCI devices - you only need to do messy things like this for RAM (but the firmware should've setup RAM correctly for you). Of course (with only VBE's information) you can't even assume the video card is PCI (e.g. might be "VESA local bus" or something) and it's best not to make assumptions.

kohlrak wrote:
Unfortunately, VBE is not PCI, and the "PhysBasePtr" in VBE's "mode information" may not be the start of video display memory, and "LinBytesPerScanLine" field multiplied by the vertical resolution is unlikely to be the total size of video display memory. You could do some "conservative estimation" based on the information provided by VBE though (e.g. round the starting address up and the size down until size is a power of 2 and starting address is on a natural boundary). This may not be as good as getting the real info from PCI configuration space, but it's probably much easier if you don't have code to enumerate PCI devices yet.
Well, it has to be at least 4MB. Fortunately, from what i have seen, it usually likes to start at a nice address. IIRC, intel even stated that write-combine is meant for things like LFBs, so i would assume companies would make that in mind. Though i probably should test (actually, my MTRR code already does: just AND it).

I'm not sure which video mode you're using (e.g. 1024*768 * 32-bpp is only 3 MiB).

Most video cards will use a nice start address, but there's no guarantee that all of them always will. For example, video display memory might start at 0xE0000000 (and comply with PCI specs) but nothing prevents VBE from telling you that "PhysBasePtr" is 0xE0001234 (where the first 0x1234 bytes of video display memory aren't used by the video mode).

kohlrak wrote:
Brendan wrote:Once you go to protected mode, you can simply go back to real mode by loading "real mode compatible" segment registers, clearing the bit in CR0, then loading actual real mode segment registers. However, this only works if you haven't modified the state of any hardware that the BIOS relies on (e.g. don't disable PIC chips and enable IO APIC and expect BIOS to work with IO APIC or something);
I even tried a barebones example below, but i'm a little curious what you mean by "read mode compatible segment registers."

In real mode, when a segment register is loaded the CPU doesn't touch the segment attributes or the segment limit - the CPU simply assumes they are right (even if they aren't). This means that you need to load the segment attributes and segment limit in protected mode with values that will be right for real mode (e.g. CS is a 16-bit code segment and not a 32-bit code segment, and CS limit is 64 KiB and not anything else).

kohlrak wrote:
Brendan wrote:and you can't spend too long in protected mode without worrying about the BIOS losing IRQs. For a simple "copy memory" function that will never copy more than 1 MiB, none of these things matter (you won't be messing with hardware and won't be in protected mode for too long) and it should be relatively easy.
Shouldn't the BIOS poll the hardware for what it needs before returning to the realmode application? If not, i could always use the vrefresh polling i do as a pretty reliable wait. As for after that, what could the BIOS really need (especially when i only need it to change video modes and load and save files)?

The BIOS simply assumes the CPU is in real mode all the time and that it has control of all the hardware. If you switch to protected mode then that's your problem (the BIOS doesn't know and doesn't do anything special in case).

kohlrak wrote:I plan on spending most of my time in pmode and just swithing back to real mode to do some minor tasks like that.

That changes things a lot. In this case (in protected mode most of the time) there's only 4 sane options:

Your OS takes control of the hardware; and therefore has no reason to use any real mode/BIOS code.
The BIOS remains in control of the hardware; and your protected mode code frequently switches back to real mode to do "nothing" (e.g. a few NOP instructions) to give the BIOS a chance to handle pending IRQs (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine).
The BIOS remains in control of the hardware; but you install your own protected mode IRQ handlers that switch back to real mode and pass control to the BIOS's original IRQ handlers (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine).
The BIOS remains in control of the hardware; and you use virtual8086 to run the BIOS's real mode IRQ handlers (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine)

Most OSs use the first option, but that means having device drivers, etc. For the remaining options, option #2 is a little dodgy (e.g. can't handle NMI) and is a pain to use (I've used it in boot code, and you need to sprinkle "do nothing" calls everywhere and never know if you've got enough or too many). Option #4 is a complicated - I wouldn't use virtual8086 mode unless you're intending to run applications designed for real mode (e.g. DOS applications running under a 32-bit OS, like Win9x).

Option #3 is a little tricky, but relatively easy. It's probably the best option for your case.

Cheers,

Brendan

Brendan · Post by **Brendan** » Wed Mar 06, 2013 2:50 am

Hi,

kohlrak wrote:Anyway, here's my most recent attempt at tackling this:

The main problem in this code is that it doesn't restore "real mode compatible" segment registers while it's in protected mode (before disabling protected mode). This would leave the CPU in a strange "32-bit real mode" state after protected mode is disabled.

Cheers,

Brendan

kohlrak · Post by **kohlrak** » Wed Mar 06, 2013 6:11 am

The formula should work, but only when size is a power of 2 and the start address is naturally aligned. If the size isn't a power of 2 and/or the start address isn't naturally aligned; then you have to either find the largest piece that is (and forget the remainder) or use multiple variable MTRRs (in an additive or subtractive way).

For example, if you want 3 MiB starting at 0xE00000000 as "write combining", then the choices would be:

pick the biggest area (2 MiB at 0xE0000000)
use multiple areas additively (2 MiB at 0xE0000000 as write-combining plus another 1 MiB at 0xE0200000 as write-combining)
use multiple areas subtractively (4 MiB at 0xE0000000 as write-combining that is too big with another 1 MiB at 0xE0300000 as uncached to cancel out the "too big" part of the first area)

For another example, if you want 2 MiB starting at 0xE00100000 as "write combining", then the choices would be:

pick the biggest area (1 MiB at 0xE0010000 - don't forget that 2 MiB wouldn't be naturally aligned)
use multiple areas additively (1 MiB at 0xE0010000 as write-combining plus another 1 MiB at 0xE0200000 as write-combining)
use multiple areas subtractively (4 MiB at 0xE0000000 as write-combining that is too big, another 1 MiB at 0xE0300000 as uncached to cancel out the end of the first area and another 1 MiB at 0xE0000000 to cancel out the start of the first area)

Due to PCI spec requirements, none of this is normally necessary for PCI devices - you only need to do messy things like this for RAM (but the firmware should've setup RAM correctly for you). Of course (with only VBE's information) you can't even assume the video card is PCI (e.g. might be "VESA local bus" or something) and it's best not to make assumptions.

Wait, i just realized, doesn't MTRR occure between the CPU and the memory controller?

I'm not sure which video mode you're using (e.g. 1024*768 * 32-bpp is only 3 MiB).

Most video cards will use a nice start address, but there's no guarantee that all of them always will. For example, video display memory might start at 0xE0000000 (and comply with PCI specs) but nothing prevents VBE from telling you that "PhysBasePtr" is 0xE0001234 (where the first 0x1234 bytes of video display memory aren't used by the video mode).

If it doesn't, it's bits will be truncated. But a simple test [vLFB], 0xFFF will check for that.

In real mode, when a segment register is loaded the CPU doesn't touch the segment attributes or the segment limit - the CPU simply assumes they are right (even if they aren't). This means that you need to load the segment attributes and segment limit in protected mode with values that will be right for real mode (e.g. CS is a 16-bit code segment and not a 32-bit code segment, and CS limit is 64 KiB and not anything else).

So basically, set them before i use them, which is natural.

The BIOS simply assumes the CPU is in real mode all the time and that it has control of all the hardware. If you switch to protected mode then that's your problem (the BIOS doesn't know and doesn't do anything special in case).

And when you get back to bios it'll catch that an interrupt occured. Most hardware should plan ahead and understand this, otherwise it's a race condition when switching to pmode, otherwise pmode can't work with the hardware if the bios can't.

That changes things a lot. In this case (in protected mode most of the time) there's only 4 sane options:

Your OS takes control of the hardware; and therefore has no reason to use any real mode/BIOS code.

Not an option.

The BIOS remains in control of the hardware; and your protected mode code frequently switches back to real mode to do "nothing" (e.g. a few NOP instructions) to give the BIOS a chance to handle pending IRQs (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine).

A silly option.

The BIOS remains in control of the hardware; but you install your own protected mode IRQ handlers that switch back to real mode and pass control to the BIOS's original IRQ handlers (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine).

Logical.

The BIOS remains in control of the hardware; and you use virtual8086 to run the BIOS's real mode IRQ handlers (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine)

Also logical, but with overhead.

Most OSs use the first option, but that means having device drivers, etc. For the remaining options, option #2 is a little dodgy (e.g. can't handle NMI) and is a pain to use (I've used it in boot code, and you need to sprinkle "do nothing" calls everywhere and never know if you've got enough or too many). Option #4 is a complicated - I wouldn't use virtual8086 mode unless you're intending to run applications designed for real mode (e.g. DOS applications running under a 32-bit OS, like Win9x).

Option #3 is a little tricky, but relatively easy. It's probably the best option for your case.

The disk controller shoudln't wine about not being baby sat, and the video card should be happy with me messing with LFB only. Aside from the PIT, i don't know what should be firing off constantly, and many devices like the keyboard and mouse can be ignored until you want to initialize them. Other things, would probably be the various buses themselves (as things external to the computer cannot assume they have bios support), but they shouldn't need anything if i don't need them (even then, shouldn't there be a relatively large queue?). By the time i need drivers on those buses, i'll be implementing drivers for that bus, anyway, which will be run very early in the kernel. If the hardware cannot handle being ignored without frying, it has no purpose throwing interrupts directly instead of having indirect access. These things, so far, have backwards compatibility in mind, and windows certainly isn't implementing step 1. If it doesn't have a driver, it wines, and leaves it alone until it has one.

The main problem in this code is that it doesn't restore "real mode compatible" segment registers while it's in protected mode (before disabling protected mode). This would leave the CPU in a strange "32-bit real mode" state after protected mode is disabled.

If i put them before the switch, it triple faults because it uses them in order to switch (specifically, the segment register). However, if i move all but SS, i end up with the exact same crashing reason i get as if i don't move them from where i originally had them. Something else must be at play, here.

EDIT: Oh, and if i put "jmp $" before sti, the problem doesn't occur. If you put it after sti, it crashes with the error. So logic dictates it's either a problem with the code segment (which it can't be if i can fill the segment registers fine), the IVT, or the GDT, which are getting saved before mode switch then restored to return to real mode.

Owen · Post by **Owen** » Wed Mar 06, 2013 9:47 am

kohlrak wrote:
The BIOS simply assumes the CPU is in real mode all the time and that it has control of all the hardware. If you switch to protected mode then that's your problem (the BIOS doesn't know and doesn't do anything special in case).
And when you get back to bios it'll catch that an interrupt occured. Most hardware should plan ahead and understand this, otherwise it's a race condition when switching to pmode, otherwise pmode can't work with the hardware if the bios can't.

Protected/Long mode code can easily work with the hardware; it just follows the same assumptions the BIOS does: that it is in control the whole time. The driver will, obviously, re-initialize the hardware as it requires when it loads...

The BIOS remains in control of the hardware; and you use virtual8086 to run the BIOS's real mode IRQ handlers (and you have a "switch to real mode and do BIOS function then switch back to protected mode" routine)
Also logical, but with overhead.

Switching between v8086 mode and protected mode will have lower overhead than between real mode and protected mode (and less interrupt loss issues)

The disk controller shoudln't wine about not being baby sat, and the video card should be happy with me messing with LFB only. Aside from the PIT, i don't know what should be firing off constantly, and many devices like the keyboard and mouse can be ignored until you want to initialize them. Other things, would probably be the various buses themselves (as things external to the computer cannot assume they have bios support), but they shouldn't need anything if i don't need them (even then, shouldn't there be a relatively large queue?). By the time i need drivers on those buses, i'll be implementing drivers for that bus, anyway, which will be run very early in the kernel. If the hardware cannot handle being ignored without frying, it has no purpose throwing interrupts directly instead of having indirect access. These things, so far, have backwards compatibility in mind, and windows certainly isn't implementing step 1. If it doesn't have a driver, it wines, and leaves it alone until it has one.

If the hardware's interrupts aren't delivered to the BIOS in the quantity and order the BIOS expects them to be, all bets are off. Nowhere are the BIOS' assumptions about the hardware it interacts with documented.

If i put them before the switch, it triple faults because it uses them in order to switch (specifically, the segment register). However, if i move all but SS, i end up with the exact same crashing reason i get as if i don't move them from where i originally had them. Something else must be at play, here.

You must fully switch to 16-bit protected mode, then to real mode. Part of "fully switching" includes loading a 16-bit descriptor into SS.

If you don't load a 16-bit DS/ES/SS, then the BIOS will crash as soon as it attempts to do addressing calculations off of those segment registers. If you don't load a 16-bit CS, then you'll end up in some "32-bit real mode" which doesn't function at all with all of the rest of real mode.

Mode switch code is well documented; I suggest you look at that.

kohlrak · Post by **kohlrak** » Wed Mar 06, 2013 7:36 pm

Protected/Long mode code can easily work with the hardware; it just follows the same assumptions the BIOS does: that it is in control the whole time. The driver will, obviously, re-initialize the hardware as it requires when it loads...

That's what i figured, so basically we have to just keep it in the back of our minds when using the bios to do certain things for us (but really, how much hardware actually needs babysat?).

Switching between v8086 mode and protected mode will have lower overhead than between real mode and protected mode (and less interrupt loss issues)

Do you know of any examples? I understand it's something that you should probably implement early on in your kernel since it requires special permissions be set and things like that, however all the examples i have are far from minimal. To make matters worse, it's like spaghetti code without jumps (some initialization goes here, some goes there, and the brunt goes here and that's what you find).

Then again, i guess i could stop being lazy and trying to learn it all, even if i'm not going to use much of it (aside from v8086).

If the hardware's interrupts aren't delivered to the BIOS in the quantity and order the BIOS expects them to be, all bets are off. Nowhere are the BIOS' assumptions about the hardware it interacts with documented.

No, but most of the hardware can be predicted. The sound card and the NIC are the only things i can think of that should be spamming the bios for attention. I expect, instead, that hardware simply throws out info to tell you it's ready or something like that (and, heck, alot of hardware you have to ASK it when it's ready instead of it simply telling you). Then again, this could be my lack of experience showing, but what all really needs babysat?

You must fully switch to 16-bit protected mode, then to real mode. Part of "fully switching" includes loading a 16-bit descriptor into SS.

Which i'm doing, but i have to wait until after the switch to do it, otherwise you end up with a GPF (i think that's what it's whining about).

If you don't load a 16-bit DS/ES/SS, then the BIOS will crash as soon as it attempts to do addressing calculations off of those segment registers. If you don't load a 16-bit CS, then you'll end up in some "32-bit real mode" which doesn't function at all with all of the rest of real mode.

Well yeah. If you see my example above, i'm setting them to 0 (which is the proper segment for where they currently are) as soon as i do the retf function to actually get back into real mode.

Mode switch code is well documented; I suggest you look at that.

Where might i find it? I'm anxious to move on.

Gigasoft · Post by **Gigasoft** » Wed Mar 06, 2013 8:58 pm

kohlrak wrote:Which i'm doing, but i have to wait until after the switch to do it, otherwise you end up with a GPF (i think that's what it's whining about).

If you don't load a 16-bit DS/ES/SS, then the BIOS will crash as soon as it attempts to do addressing calculations off of those segment registers. If you don't load a 16-bit CS, then you'll end up in some "32-bit real mode" which doesn't function at all with all of the rest of real mode.
Well yeah. If you see my example above, i'm setting them to 0 (which is the proper segment for where they currently are) as soon as i do the retf function to actually get back into real mode.

No, you have to set CS and SS before you go into real mode. When you execute the retf instruction, you are already in real mode. This means that you have to have GDT entries for a 16 bit code and data segment.

Brendan · Post by **Brendan** » Wed Mar 06, 2013 9:35 pm

Hi,

kohlrak wrote:
Protected/Long mode code can easily work with the hardware; it just follows the same assumptions the BIOS does: that it is in control the whole time. The driver will, obviously, re-initialize the hardware as it requires when it loads...
That's what i figured, so basically we have to just keep it in the back of our minds when using the bios to do certain things for us (but really, how much hardware actually needs babysat?).

Most hardware devices do what they're told to do (which is device specific). When the BIOS is in control of the hardware it'd be wrong to make any assumptions about what devices have been told to do by the BIOS; so "how much hardware actually needs babysat" isn't a question that can be answered accurately (beyond answers that aren't going to be useful, like "typically.." and "for one specific BIOS..."). Basically, when the BIOS is in control of the hardware you should assume all hardware needs to be babysat (even if you know this assumption is wrong in some cases or even in most cases).

Of course the BIOS typically doesn't provide drivers for most of the hardware (mouse, joystick, touchpad, touchscreen, networking, sound, USB, APICs, HPET, etc) and the drivers it does provide are bad (poor video, bad/synchronous disk IO, keyboard that only works for US QWERTY layouts, serial that is almost entirely unusable, poor time keeping, no multi-CPU support, etc).

If you think about it, the advantages of using the BIOS (a few trivial drivers that you shouldn't actually use anyway) doesn't justify all the disadvantages (kernel and drivers designed around unnecessary limitations where everything has to be crippled to maintain "BIOS compatibility", OS tied to a technology that's already been deprecated by UEFI, etc).

kohlrak wrote:No, but most of the hardware can be predicted. The sound card and the NIC are the only things i can think of that should be spamming the bios for attention. I expect, instead, that hardware simply throws out info to tell you it's ready or something like that (and, heck, alot of hardware you have to ASK it when it's ready instead of it simply telling you). Then again, this could be my lack of experience showing, but what all really needs babysat?

How do you know the computer hasn't got some sort of "remote management" capability, where the BIOS redirects all keyboard and video to a network card? USB traffic might being redirected to firmware SMM code (for "PS/2 keyboard emulation"). HPET might be being used to emulate PIT. Then there's a whole layer of power management (monitoring user activity, CPU temperature, fan speeds, battery state, screen backlight brightness, etc) going on in the background, plus (maybe) fault monitoring and logging (e.g. ECC, etc). Basically; the BIOS is in charge and your OS is just along for the ride (and needs to stay out of the BIOSs way).

Cheers,

Brendan

kohlrak · Post by **kohlrak** » Thu Mar 07, 2013 4:37 pm

No, you have to set CS and SS before you go into real mode. When you execute the retf instruction, you are already in real mode. This means that you have to have GDT entries for a 16 bit code and data segment.

Isn't the point of RETF to set the CS? Moreover, wouldn't it throw a fit for trying to use the null segment before you got a chance to go back to real mode?

Most hardware devices do what they're told to do (which is device specific). When the BIOS is in control of the hardware it'd be wrong to make any assumptions about what devices have been told to do by the BIOS; so "how much hardware actually needs babysat" isn't a question that can be answered accurately (beyond answers that aren't going to be useful, like "typically.." and "for one specific BIOS..."). Basically, when the BIOS is in control of the hardware you should assume all hardware needs to be babysat (even if you know this assumption is wrong in some cases or even in most cases).

I see no point in preparing for a phantom situation that is very unlikely. At that point we my as well be preparing for the CIA to do a windows check and we have to pass a fake string to some random port to prove we're still easy targets. (Although, I do remember reading Microsoft requiring copyright to them for drivers to work, but i never checked this, and it's outside the scope of OS dev.)

Of course the BIOS typically doesn't provide drivers for most of the hardware (mouse, joystick, touchpad, touchscreen, networking, sound, USB, APICs, HPET, etc) and the drivers it does provide are bad (poor video, bad/synchronous disk IO, keyboard that only works for US QWERTY layouts, serial that is almost entirely unusable, poor time keeping, no multi-CPU support, etc).

Mouse, networking, sound, and USB, actually are supported by many.

Mouse + USB: USB ps/2 emulation.
Networking: PXE is now commonstly supported
sound: VESA extension (the level of support i'm not sure of, though, because i haven't tested it)

If you think about it, the advantages of using the BIOS (a few trivial drivers that you shouldn't actually use anyway) doesn't justify all the disadvantages (kernel and drivers designed around unnecessary limitations where everything has to be crippled to maintain "BIOS compatibility", OS tied to a technology that's already been deprecated by UEFI, etc).

The problems i fortell with UEFI remain for another topic entirely (including increased rate of deprecation, making OSes constantly have to be updated to work, OS locking, spec closing, etc). I see no bright future for UEFI and OS development. As such, there's a reason why i'm keeping this kernel so simple and not having major plans for it. UEFI is one of the bigger reasons. (Fortunately, UEFI supports a bios compatibility mode of sorts).

How do you know the computer hasn't got some sort of "remote management" capability, where the BIOS redirects all keyboard and video to a network card?

Would have to be explicitly supported by my OS to work anyway. Clearly I have no intention of supporting remote management. There's no need for it in this simple OS.

USB traffic might being redirected to firmware SMM code (for "PS/2 keyboard emulation").

Mouse is one thing (but i don't need it for this OS). Keyboard on the other hand is usually embedded in computers without PS/2 ports.

HPET might be being used to emulate PIT.

PIT seems to be working fine on all my test beds. (I'm still trying to figure out whether or not i even want to use PIT in this kernel, too. Right now i'm leaning towards "no.")

Then there's a whole layer of power management (monitoring user activity, CPU temperature, fan speeds, battery state, screen backlight brightness, etc) going on in the background, plus (maybe) fault monitoring and logging (e.g. ECC, etc). Basically; the BIOS is in charge and your OS is just along for the ride (and needs to stay out of the BIOSs way).

All of which i don't need (ECC seemsd impractical when i'm dealing with upper memory and bios only deals with lower memory, anyway). I plan on implementing exceptions myself. I don't need power management.

The only stuff i NEED for my small kernel:
-Reading and Saving disk sectors
-Video Mode switching (from text to graphics and back)
-RAM
-CPU
-keyboard (not supporting USB)
-A20
-FPU
-32bit memory access
-PIC emulation (apic emulates the legacy pic itself by intel's spec)
-(Optional) sound
-(Optional) ps/2 mouse

As long as that works without triplefaulting or taking of on it's own (which it shouldn't), the rest (the stuff using that) will be coded by me. The kernel doesn't need to replace windows, linux, mac, bsd, or any of that. The whole point of this kernel is to replace the "scarystuff.inc" files i have to handle the cross OS stuff. People run into macros instead of learning the language, so they get upset wondering when their investment in time is going to pay of and they're actually going to learn how to use assembly instead of red-tape. People don't want to hear "i'll explain later." They want to hear "real stuff now, magic later." This kernel is nothing more than a sandbox that actually runs (unlike other sandboxes that need interpreters). Even if the kernel isn't a practical development platform, it is a practical learning platform from which to learn practical development techniques to be applied to existing practical development platforms, but allowing you to learn those techniques before learning the quirks.

Combuster · Post by **Combuster** » Thu Mar 07, 2013 5:34 pm

kohlrak wrote:Mouse, networking, sound, and USB, actually are supported by many.

Mouse + USB: USB ps/2 emulation.
Networking: PXE is now commonstly supported
sound: VESA extension (the level of support i'm not sure of, though, because i haven't tested it)

The BIOS doesn't provide mouse support, only firmware to make an USB mouse appear as an PS/2 one, so no, not quite.
PXE is the bootloader stored on your network card. No network boot = no PXE.
VESA starts with "Video". Sound? No way.

And there are many, many more unfounded statements in the remainder of your post.

A lot to learn, you have.

kohlrak · Post by **kohlrak** » Thu Mar 07, 2013 7:05 pm

Combuster wrote:The BIOS doesn't provide mouse support, only firmware to make an USB mouse appear as an PS/2 one, so no, not quite.

Provision of firmware via BIOS is still support.

PXE is the bootloader stored on your network card. No network boot = no PXE.

And PXE use ends up providing interrupts (both 16bit and a 32bit entry point).

VESA starts with "Video". Sound? No way.

Welcome to VBE3+, even if it is under-supported (level of support is surprising, actually).

RBIL wrote:Int 10/AX=4F13h/BX=0000h - VESA VBE/AI (Audio Interface) - INSTALLATION CHECK
Int 10/AX=4F13h/BX=0001h - VESA VBE/AI (Audio Interface) - LOCATE DEVICE
Int 10/AX=4F13h/BX=0002h - VESA VBE/AI (Audio Interface) - QUERY DEVICE
Int 10/AX=4F13h/BX=0003h - VESA VBE/AI (Audio Interface) - OPEN DEVICE
Int 10/AX=4F13h/BX=0004h - VESA VBE/AI (Audio Interface) - CLOSE DEVICE
Int 10/AX=4F13h/BX=0005h - VESA VBE/AI (Audio Interface) - UNINSTALL DRIVER
Int 10/AX=4F13h/BX=0006h - VESA VBE/AI (Audio Interface) - DRIVER CHAIN/UNCHAIN

And there are many, many more unfounded statements in the remainder of your post.

A lot to learn, you have.

It would seem that you do, as well.

Anyway, I would like to thank you all for contributing your ideas. Somewhere else someone showed me a list of pmode tutorials, and lesson 2 is on returning to real mode, and i think i've already identified my mistake.

OSDev.org

v86

v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86

Re: v86