Page 1 of 1

CPU bugs database?

Posted: Sat Aug 26, 2006 2:07 pm
by Habbit
This comes from the "invalidate TLBs in the 386" thread, where I learned that CPU detection code is far complex than I thought because the only thing to watch out for is not the lack of certain instructions in older processors, but also some quite nasty bugs in some CPUs.

And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)

If the answer is "no", then can we add this seemingly "common knownledge" (at least between experienced OSdevers) to the wiki?

Thank u all ppl. You are great! ;)

Re:CPU bugs database?

Posted: Sat Aug 26, 2006 3:21 pm
by Kemp
A good place to start would be the errata or processor manuals sections at http://www.x86.org.

Re:CPU bugs database?

Posted: Sat Aug 26, 2006 4:31 pm
by Brendan
Hi,
Habbit wrote:You're just trying to scare me, aren't you? Half a MiB just to reliably detect which instructions won't send the system to the dogs??!! Dammit, that would mean that code will be the BIGGEST part of my kernel, well above other things that I considered "more complex", such as the scheduler and the VMM!!
Hehee - no, I'm not trying to scare you. Just thought you mind be interested in how messy it can get. I should point out that my own code is intended to be "as good as possible", which means it's meant to do things that both Windows and Linux don't.

Depending on what sort of OS you want it may be enough to ignore some of the things I listed - for e.g. ignoring most CPU bugs would save a lot of hassle...
Habbit wrote:And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)
The simple answer is "not that I'm aware of". There is one place with some details on the most severe Intel CPU bugs (here), but nothing much else except from manufacturer's data sheets, errata, etc.

I spent ages googling (mostly unsuccessfully) for documentation for some CPUs (VIA, Rise, Cyrix, Centaur, Transmeta), and then even more time going through the documentation I did find (mostly published Intel and AMD errata, which can be found easily enough from these manufacturer's web sites).

My own code is also online: The CPU errata is currently incomplete (Intel up to Pentium II only), and it's done with respect to my own OS - bugs with things my OS won't use (like bugs with large pages sizes, etc) are ignored.


Cheers,

Brendan

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 4:43 am
by Combuster
Well some things are common knowledge, like F00F, FDIV and FISTP, a lot of others are not, and i have to agree that it might be a good thing to summarize the known and less known things in a wiki page for quick reference.

Brendan: I have Cyrix docs (Cyrix 6x86 to be precise) - if people are interested i can upload these.
I believe the VIA processor is a variation to the MediaGX, designed by cyrix, bought and improved by amd, sold to VIA but non intel/amd docs are hard to find anyway. As for the rest, not a clue

If more people think a wiki page would be helpful, i'll be happy to document what I found over time, although i only know the common intel issues, the coma bug, and the problems with overlapping cpuids.

Apart from that, i dont think there is a site which summarizes bugs of more than one processor (except for the stubs on wikipedia), so that'd make it a good idea. Besides a lot of people do not realize that processors are built by humans and thus fundamentally flawed so to speak...
Well, that makes two people in favour. Who votes next?

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 5:48 am
by Kemp
You can usually assume my vote for anything of this nature will be positive, I'm the sort who obssessively tries to collect things together, make lists, etc.

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 7:14 am
by Combuster
http://www.mega-tokyo.com/osfaq/CpuBugs

Hope you like the title. It is far from complete, but for wiki's sake, go ahead and add your thing. When i get home I'll dig up the cyrix docs and generate some sample code for the coma bug, and perhaps document some other intel things i'm familiar with.

Enjoy

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 7:28 am
by Candy
I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample), explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
brand != intel => no
proctype != Pentium P5 => no
procspeed > 66 => no
yes
Or something similar, at least, so that you can quickly disqualify the option for most processors. You should in any case be able to tell whether it might apply to your target processor group.

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 10:54 am
by Combuster
Candy wrote: I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample),
i.e. you'd like to see some sample code reproducing the bug in all circumstances, as well as problem-solving code?
, explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
I hope the current approach suffices.

Btw, i was wondering if it was useful to add some sort of rating to these bugs or to have the reader decide on that.

Other remarks are welcome, i'll try and make these two in good condition before documenting some other things to save myself from rewrites ;)

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 11:02 am
by Candy
It's probably a good idea to add a form of rating indicating the severeness of the bug, IE, in how far can you create it without explicit trouble. Suggestion for the rating categories:

1. Bugs that cause generic problems or that fail any program unpredictably. Example is FDIV, which doesn't show a lot but does show on all possible FDIVs, even in compiled code.
2. Bugs that cause generic problems that can appear in compiled code, but only if the user intentionally uses assembly code to this extent. (COMA bug)
3. Bugs that cause problems in hand-crafted assembly code that's pretty impossible to create with a compiler (F00F bug)
4. Bugs that cause problems only in privileged mode, which would otherwise fall under any of the above categories
5. Bugs with less impact.

I've thought about occurrence information but it's pretty irrelevant, especially since each occurrence that fails fails the program (and/or the OS).


Nice website with some X86 info is on www.x86.org. F00F bug at http://www.x86.org/errata/dec97/f00fbug.htm

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 11:56 am
by blip
86BUGS.LST in INTER61D.ZIP from here has a list of some bugs and quirks but many are irrelevant, pertaining to CPUs like 186s. For people targetting the 386 as the minimum required x86 CPU, there is information about CMPXCHG:
On the A-step of the 486, this Mnemonic was coded using the opcodes for the, discarded, A- to B0-step 386 instructions XBTS (a6) and IBTS (a7). Because of software conflicts with software written for the early 386 DX the opcodes for the 486 were changed to the ones above starting with the B step.
Of course that's not the only relevant thing listed in the file.

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 12:03 pm
by Candy
(C) Copyright 1993, 1994 By Harald Feldmann Revision 04, Nov 3rd 1994.
Not sure that's exactly what we want.

Re:CPU bugs database?

Posted: Sun Aug 27, 2006 1:53 pm
by Brendan
Hi,

Some notes....

Every 80x86 CPU ever made has bugs. For Intel CPUs ranging from Pentium to the latest "Core" CPUs there's between 70 to 190 different bugs for each CPU, listed in the corresponding errata (and published by Intel). I assume AMD CPUs aren't much different.

For the errata lists, some of the things Intel include are insane. One example would be"Item No 8: Mode C paging in SMM causes use of incorrect page tables" (Pentium Pro), which doesn't make much sense because the manual explicitly states that paging isn't supported in SMM mode on any CPU.

Some of the things listed can't effect any OS. An example would be"Item No 43: L2 cache may incorrectly report BIST failure" (Pentium II). In this case the Built In Self Test would only occur during boot before an OS is started, so the problem can't effect an OS after it has started.

There's usually quite a few problems with FRC mode (Functional Redundancy Checking). The idea here is that a pair of CPUs are glued together and one CPU is used to check the output the other. If the output of both CPUs doesn't match an FRCERR is signalled to external circuitry. I've never seen or heard of this feature actually being used - it might only be used by Intel for quality control (although I have heard of similar arrangements for non-Intel high end servers, where extreme fault tolerance is needed - here if you're curious).

For a number of things listed there's motherboard, chipset or BIOS work-arounds, and some items only effect motherboards that do some things in certain ways. In this case it's impossible to know if the work-around has been implemented or not, or if the motherboard is effected by the bug. In these cases I assume the problem is fixed because there's no practical way of detecting otherwise and it's possibly not a good idea to warn users about problems that might not exist.

For how I've been classifying CPU bugs for my OS, it might be worth reading this page in my user manual. For Intel chips from Pentium to Pentium II, I've got 7 classified as "errata" and 8 classified as "flaws", however some of the "flaws" represent several problems that all effect the same CPUs. There's also 33 of them that I've marked as "TO BE REVIEWED" - for these I either don't understand enough about the problem, or I am unsure whether it will effect my OS or not (at least half of these won't be classified until I implement machine check exception handling).

I should also point out that Intel's published errata is very comprehensive - they seem to list everything that could ever effect anything. Because of this I'm quite lenient - I tend to look for reasons why each bug won't matter, rather than looking for reasons why each bug might matter. This is party because I've been using Intel CPU's (with heaps of "bugs") for many years without ever actually having a problem that can be attributed to a CPU bug.

For a public database of bugs, I'd suggest the following fields:
  • Bug Name
    Category
    Effected CPU/s (manufacturer, family, model, stepping, etc)
    Contributing Factors (a short description)
    Bug Description (about 1 paragraph)
    Workaround (about 1 paragraph if a work-around can be implemented by the OS kernel)
For an example:

Bug Name: F00F Bug
Category: SEVERE, FIXABLE
Contributing Factors: LOCK CMPXCHG8B instruction
Bug Description: The LOCK CMPXCHG8B instruction can be used to completely lock up the computer (at any privilege level) due to a CPU bug that leaves the bus locked while trying to invoke an invalid instruction exception.
Workaround: The easiest known work-around is to set the IDT to "write-through" caching (e.g. using the flags in the page table entry). For more information see http://www.x86.org/errata/dec97/f00fbug.htm.
Effected CPUs: All Intel Pentium CPUs.

The problem here is going to be deciding which pieces of errata published by Intel and AMD actually matter, and finding details for other manufacturers....


Cheers,

Brendan