OSDev.org

Posted: **Fri Aug 25, 2006 10:09 am**

When writing some assembly code for the boot loader, I've found that the INVLPG instruction is only present in 80486 and next. I don't expect my OS to be used in 386s, but if I start raising the mark each time I encounter something like this, my "any-i386" OS will turn in "at least P4 with SSE3" in no time (i.e., the CPUID instruction would restrict me to the Pentium family, and the list goes on). So, any one knows how to force the 386 to invalidate its TLBs?
Thanks

Posted: **Fri Aug 25, 2006 10:44 am**

Hi,

Habbit wrote:When writing some assembly code for the boot loader, I've found that the INVLPG instruction is only present in 80486 and next. I don't expect my OS to be used in 386s, but if I start raising the mark each time I encounter something like this, my "any-i386" OS will turn in "at least P4 with SSE3" in no time (i.e., the CPUID instruction would restrict me to the Pentium family, and the list goes on). So, any one knows how to force the 386 to invalidate its TLBs?

The only way for 80386 is to reload CR3 - for e.g.:

Code: Select all

    mov eax,cr3
    mov cr3,eax

Cheers,

Brendan

Posted: **Fri Aug 25, 2006 2:13 pm**

Brendan wrote: The only way for 80386 is to reload CR3 - for e.g.:
Code: Select all
    mov eax,cr3
    mov cr3,eax

Just like that? I... didn't think about that :-[...
I suppose that will make the proccessor throw away ALL the contents in the TLBs, won't it? So it shouldn't be used in, i.e. the scheduler except in real need...
Hmmm... I think i will have to somehow detect the availability of the instruction. The simplest way would possibly be just executing it and catching the "undefined opcode" exception if it is not recognized (in the 386). Maybe testing for CPUID would also work, at least without an IDT, but that approach would forfeit half of the 486s that do have INVLPG but no CPUID... ???

dammit, im losing my miiiind!!!
(bochs running habbitsbrain.elf: 3rd exception without resolution, RIP=4242424242424242)

Posted: **Fri Aug 25, 2006 5:48 pm**

Also flushing the entire TLB is way more time consuming than flushing just a couple of pages (to what extent exactly I'm not sure though).

Posted: **Sat Aug 26, 2006 12:03 am**

Hi,

Habbit wrote:Hmmm... I think i will have to somehow detect the availability of the instruction. The simplest way would possibly be just executing it and catching the "undefined opcode" exception if it is not recognized (in the 386).

No - the easiest way is to check to see if the "aligment check" flag in the eflags register can be changed. An 80386 (or compatible) doesn't support the alignment check feature, and will have this bit hardwired to '1' or '0'. For e.g.:

Code: Select all

   pushfd
   mov eax,[esp]         ;eax = original eflags
   mov ebx,eax         ;ebx = original eflags
   xor eax,0x00040000      ;Invert AC flag
   push eax         ;Store new eflags on stack
   popfd            ;Read new eflags
   pushfd            ;Store modified eflags
   pop eax            ;eax = eflags after inverting AC flag
   xor eax,ebx         ;If bits changed set them, else clear them
   popfd            ;Restore original eflags
   test eax,0x40000      ;Is alignment check supported?
   je .CPUold         ; no, must be older than an 80486
   jmp .CPUnew         ; yes, must be 80486 or newer

Habbit wrote:I suppose that will make the proccessor throw away ALL the contents in the TLBs, won't it? So it shouldn't be used in, i.e. the scheduler except in real need...

Reloading CR3 (and flushing the entire TLB) is actually fairly fast. The peformance problem comes after this - when you're trying to do work and constantly getting TLB misses. How much this effects performance will depend on how much was in the TLB that would've been reused if the TLB wasn't flushed. If you're only changing one page table entry then there'd be a huge difference, but if you're doing a task switch it might not matter as much.

In addition to detecting support for INVLPG I'd also recommended detecting support for global pages (introduced with Pentium Pro, and detected via. CPUID). The idea here is that you mark your entire kernel as "global" (using the "global" flag in page table entries, after enabling the feature in CR4), so that TLB entries that correspond to the kernel itself are not flushed when CR3 is changed. This improves performance for task switches - the old task's TLB entries don't really matter because the new task won't be using them anyway, but the kernel's TLB entries are kept so that the kernel continues to run without unwanted TLB misses.

Of course a simple "is it an 80486" routine tends to grow into full fledged CPU detection code (for both supported features and CPU bugs), but that's a different matter...

Cheers,

Brendan

Posted: **Sat Aug 26, 2006 3:24 am**

if you care about performance, you can keep pointers to your page flushing subroutines - You can point it at e.g. reload_cr3 by default and change it to flush_page if you detected a 486+. (if you even more care about performance, try self-modifying code, but that's probably not worth the trouble - although, thats what my kernel does

)

Posted: **Sat Aug 26, 2006 3:43 am**

Well, self-modifying code looks like a real pain in the *ss, so I don't want to take that road if I can avoid it. The IDT-less detection code looks more interesting, but I am starting to realise I should just run a generic "cpu feature detection" when my kernel starts and save the results somewhere accessible for my other classes. That way, the physical memory manager could know whether INVLPG is supported or not, the scheduler would know about SYSCALL/SYSRET, SYSENTER/SYSEXIT, etc.

Thank you all!!

Posted: **Sat Aug 26, 2006 1:29 pm**

Hi,

Habbit wrote:Well, self-modifying code looks like a real pain in the *ss, so I don't want to take that road if I can avoid it.

There's several approaches to this - "if/else/then" during runtime, indirection (e.g. "call [my_invalidate_page_routine]") , self-modifying code, conditional compilation, etc. Each method has advantages and disadvantages.

In general my approach is to use a modular micro-kernel, where the "kernel modules" used as part of the micro-kernel are determined at run-time. This avoids the performance considerations of some techniques, while also avoiding the "compile before use" problem. I do full CPU detection in real mode during boot, so that the detected CPU features can effect everything, including auto-selecting 64 bit kernel modules if all CPUs support long mode.

Like many people here I have no intention of supporting 80386 though - my minimum requirements for single-CPU systems is an 80486DX (with inbuilt FPU). This is mostly because of INVLPG, WBINVD, CMPXCHG and the native FPU exception handling. For multi-CPU my minimum requirement is Pentium, mostly because of differences in local APICs.

To me this comes down to how much effort is required to support older CPUs and whether or not this extra work is justified by the number of end users who might need support for older CPUs when/if the OS is ever functional.

Habbit wrote:The IDT-less detection code looks more interesting, but I am starting to realise I should just run a generic "cpu feature detection" when my kernel starts and save the results somewhere accessible for my other classes. That way, the physical memory manager could know whether INVLPG is supported or not, the scheduler would know about SYSCALL/SYSRET, SYSENTER/SYSEXIT, etc.

Some rough notes that may (or may not) help:

* Due to bugs in an old versions of WindowsNT some CPUs disable the CPUID instruction (Cyrix) or don't report some features even though they are supported (CMPXCH8 for both Centaur and Rise Pentium compatible CPUs).
* Some manufacturers use the same feature flag for different features, or different feature flags for the same features. Examples include "Extended MMX" for Cyrix or the 64 bit "LAHFSAHF" for Intel.
* There's a large number of bugs with the feature flags returned by CPUID (for all manufacturers). The worst example is Cyrix where the ASCII values for "tead" is returned in ECX for CPUID=1, but even Intel mess this up.
* Some CPUs support a feature that doesn't work reliably. It's worth considering detecting this and clearing the corresponding feature flag so that the OS doesn't rely on the buggy feature. One example is some Pentium II OverDrive chips that don't support the PAT flags correctly (in this case I clear the PAE feature flag to avoid the bug).
* Some CPUs have unrelated bugs that can be "worked around" in software. The best example here is the Pentium 0xF00F bug (which allows "malicious" user level code to completely lock up the computer). It might be worthwile to detect these fixable bugs, and then implement the work-arounds if the bugs are detected.
* All CPUs have other bugs that can't be fixed by the OS. In this case it might be worth keeping track of the most severe ones and reporting them to the user and/or administrator. That way the user can decide if the CPU bugs matter - for example, I wouldn't run a bank on Pentium CPUs due to FPU bugs, and wouldn't run a critical 24/7 server on some Pentium II chips due to bus handling errors that can cause the computer to lock-up (but I would still use these CPUs for normal work, where the occasional problem isn't going to cost someone large amounts of money).
* It can be nice to build manufacturer and brand name strings for older CPUs that don't return these from CPUID. It can also be nice to detect some AMD chips (which allow these strings to be set by the BIOS) and to make sure the correct identification strings are returned (so that a a clever hacker can't modify the flash BIOS and make the CPU report itself as a much more expensive chip).

I do all of the above (with the addition of cache size auto-detection and detecting the number of cores and logical CPUs in each CPU). For the bugs I've only really done Intel CPUs from 80486 to Pentium II so far. This adds up to around 200 KB of source code, but I'm expecting it to grow to around 500 KB by the time I call it "complete".

Cheers,

Brendan

Posted: **Sat Aug 26, 2006 1:59 pm**

Brendan wrote:
Some rough notes that may (or may not) help:
* Due to bugs in an old versions of WindowsNT some CPUs disable the CPUID instruction (Cyrix) or don't report some features even though they are supported (CMPXCH8 for both Centaur and Rise Pentium compatible CPUs).
* (...)
* (...)

I do all of the above (with the addition of cache size auto-detection and detecting the number of cores and logical CPUs in each CPU). For the bugs I've only really done Intel CPUs from 80486 to Pentium II so far. This adds up to around 200 KB of source code, but I'm expecting it to grow to around 500 KB by the time I call it "complete".

You're just trying to scare me, aren't you? Half a MiB just to reliably detect which instructions won't send the system to the dogs??!! Dammit, that would mean that code will be the BIGGEST part of my kernel, well above other things that I considered "more complex", such as the scheduler and the VMM!! :-\

I'm starting to think I'd rather be designing a Pong clone. It will be easier and probably more people will use it anyways ::)

Posted: **Sun Aug 27, 2006 4:06 am**

Well, i think half a mb is overrated - unless you're going to boot-time compile your kernel for top performance...

I think i only need 50 lines of assembly to construct a capabilities map including all the fakes/bad cases. Then maybe 20 more to fix the f00f bug, coma bug and to patch the cr3/invlpg code. The rest is up to the kernel designer if they really find it worth supporting two memory schemes or not (PAE on or off), which is where most code would go... But then again, i'm building relatively size-efficient. (and so far, i've got most of this already built in my kernel, which is 75k by now... ~30k if you'd strip the comments and identations)

Iif i ever get to 500kb i'd consider a rewrite since that would probably mean an inefficient way of abstraction. But then again C code IS bloated compared to a 100% ASM kernel...

As for that whole list - some things are just a matter of taste or completely pointless altogether: bus lockups can be safely ignored in kernel since they cant be avoided anyway, as for bios hacking, there are excuses for doing that and I'd be crazy if i included something like "1337 protection".

[edit] this seems to going way off topic and i didnt notice the cpu bugs thread yet - sorry [/edit]

Posted: **Sun Aug 27, 2006 2:27 pm**

Hi,

Combuster wrote:if i ever get to 500kb i'd consider a rewrite since that would probably mean an inefficient way of abstraction. But then again C code IS bloated compared to a 100% ASM kernel...

That would depend on how complete the code is - the "500 KB of source code" is intended as a worst case estimate (and includes a lot of comments, which IMHO is necessary for code maintenance)...

In my case, the code will generate the manufacturer and brand string where possible (even when CPUID isn't supported - for e.g. it should correctly identify something like "Cyrix Cx486DRx2"). It generates a "standardized" set of feature flags, which involves removing all bugs and ambiguities, and it also detects L1, L2 and L3 cache sizes (if possible).

In addition to this it detects bugs that can be worked around, and bugs that can't be fixed but can effect stability and security. The idea here is to let system adminstrators know just how stable the hardware is, so that they can consider upgrading the hardware or using it for less critical purposes (and so they might not blame my software when a CPU bug causes the machine to lock up).

My (admittedly insane) goal is to produce an OS that is better than any other OS, including Windows, Linux, etc - I don't know of any other OS that does all of this, and I do agree it is overkill for most purposes.

Cheers,

Brendan

OSDev.org

invalidate TLBs in the 386

invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386

Re:invalidate TLBs in the 386