CPU bugs database?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Habbit

CPU bugs database?

Post by Habbit »

This comes from the "invalidate TLBs in the 386" thread, where I learned that CPU detection code is far complex than I thought because the only thing to watch out for is not the lack of certain instructions in older processors, but also some quite nasty bugs in some CPUs.

And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)

If the answer is "no", then can we add this seemingly "common knownledge" (at least between experienced OSdevers) to the wiki?

Thank u all ppl. You are great! ;)
Kemp

Re:CPU bugs database?

Post by Kemp »

A good place to start would be the errata or processor manuals sections at http://www.x86.org.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:CPU bugs database?

Post by Brendan »

Hi,
Habbit wrote:You're just trying to scare me, aren't you? Half a MiB just to reliably detect which instructions won't send the system to the dogs??!! Dammit, that would mean that code will be the BIGGEST part of my kernel, well above other things that I considered "more complex", such as the scheduler and the VMM!!
Hehee - no, I'm not trying to scare you. Just thought you mind be interested in how messy it can get. I should point out that my own code is intended to be "as good as possible", which means it's meant to do things that both Windows and Linux don't.

Depending on what sort of OS you want it may be enough to ignore some of the things I listed - for e.g. ignoring most CPU bugs would save a lot of hassle...
Habbit wrote:And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)
The simple answer is "not that I'm aware of". There is one place with some details on the most severe Intel CPU bugs (here), but nothing much else except from manufacturer's data sheets, errata, etc.

I spent ages googling (mostly unsuccessfully) for documentation for some CPUs (VIA, Rise, Cyrix, Centaur, Transmeta), and then even more time going through the documentation I did find (mostly published Intel and AMD errata, which can be found easily enough from these manufacturer's web sites).

My own code is also online: The CPU errata is currently incomplete (Intel up to Pentium II only), and it's done with respect to my own OS - bugs with things my OS won't use (like bugs with large pages sizes, etc) are ignored.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re:CPU bugs database?

Post by Combuster »

Well some things are common knowledge, like F00F, FDIV and FISTP, a lot of others are not, and i have to agree that it might be a good thing to summarize the known and less known things in a wiki page for quick reference.

Brendan: I have Cyrix docs (Cyrix 6x86 to be precise) - if people are interested i can upload these.
I believe the VIA processor is a variation to the MediaGX, designed by cyrix, bought and improved by amd, sold to VIA but non intel/amd docs are hard to find anyway. As for the rest, not a clue

If more people think a wiki page would be helpful, i'll be happy to document what I found over time, although i only know the common intel issues, the coma bug, and the problems with overlapping cpuids.

Apart from that, i dont think there is a site which summarizes bugs of more than one processor (except for the stubs on wikipedia), so that'd make it a good idea. Besides a lot of people do not realize that processors are built by humans and thus fundamentally flawed so to speak...
Well, that makes two people in favour. Who votes next?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Kemp

Re:CPU bugs database?

Post by Kemp »

You can usually assume my vote for anything of this nature will be positive, I'm the sort who obssessively tries to collect things together, make lists, etc.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re:CPU bugs database?

Post by Combuster »

http://www.mega-tokyo.com/osfaq/CpuBugs

Hope you like the title. It is far from complete, but for wiki's sake, go ahead and add your thing. When i get home I'll dig up the cyrix docs and generate some sample code for the coma bug, and perhaps document some other intel things i'm familiar with.

Enjoy
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:CPU bugs database?

Post by Candy »

I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample), explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
brand != intel => no
proctype != Pentium P5 => no
procspeed > 66 => no
yes
Or something similar, at least, so that you can quickly disqualify the option for most processors. You should in any case be able to tell whether it might apply to your target processor group.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re:CPU bugs database?

Post by Combuster »

Candy wrote: I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample),
i.e. you'd like to see some sample code reproducing the bug in all circumstances, as well as problem-solving code?
, explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
I hope the current approach suffices.

Btw, i was wondering if it was useful to add some sort of rating to these bugs or to have the reader decide on that.

Other remarks are welcome, i'll try and make these two in good condition before documenting some other things to save myself from rewrites ;)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:CPU bugs database?

Post by Candy »

It's probably a good idea to add a form of rating indicating the severeness of the bug, IE, in how far can you create it without explicit trouble. Suggestion for the rating categories:

1. Bugs that cause generic problems or that fail any program unpredictably. Example is FDIV, which doesn't show a lot but does show on all possible FDIVs, even in compiled code.
2. Bugs that cause generic problems that can appear in compiled code, but only if the user intentionally uses assembly code to this extent. (COMA bug)
3. Bugs that cause problems in hand-crafted assembly code that's pretty impossible to create with a compiler (F00F bug)
4. Bugs that cause problems only in privileged mode, which would otherwise fall under any of the above categories
5. Bugs with less impact.

I've thought about occurrence information but it's pretty irrelevant, especially since each occurrence that fails fails the program (and/or the OS).


Nice website with some X86 info is on www.x86.org. F00F bug at http://www.x86.org/errata/dec97/f00fbug.htm
blip

Re:CPU bugs database?

Post by blip »

86BUGS.LST in INTER61D.ZIP from here has a list of some bugs and quirks but many are irrelevant, pertaining to CPUs like 186s. For people targetting the 386 as the minimum required x86 CPU, there is information about CMPXCHG:
On the A-step of the 486, this Mnemonic was coded using the opcodes for the, discarded, A- to B0-step 386 instructions XBTS (a6) and IBTS (a7). Because of software conflicts with software written for the early 386 DX the opcodes for the 486 were changed to the ones above starting with the B step.
Of course that's not the only relevant thing listed in the file.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:CPU bugs database?

Post by Candy »

(C) Copyright 1993, 1994 By Harald Feldmann Revision 04, Nov 3rd 1994.
Not sure that's exactly what we want.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:CPU bugs database?

Post by Brendan »

Hi,

Some notes....

Every 80x86 CPU ever made has bugs. For Intel CPUs ranging from Pentium to the latest "Core" CPUs there's between 70 to 190 different bugs for each CPU, listed in the corresponding errata (and published by Intel). I assume AMD CPUs aren't much different.

For the errata lists, some of the things Intel include are insane. One example would be"Item No 8: Mode C paging in SMM causes use of incorrect page tables" (Pentium Pro), which doesn't make much sense because the manual explicitly states that paging isn't supported in SMM mode on any CPU.

Some of the things listed can't effect any OS. An example would be"Item No 43: L2 cache may incorrectly report BIST failure" (Pentium II). In this case the Built In Self Test would only occur during boot before an OS is started, so the problem can't effect an OS after it has started.

There's usually quite a few problems with FRC mode (Functional Redundancy Checking). The idea here is that a pair of CPUs are glued together and one CPU is used to check the output the other. If the output of both CPUs doesn't match an FRCERR is signalled to external circuitry. I've never seen or heard of this feature actually being used - it might only be used by Intel for quality control (although I have heard of similar arrangements for non-Intel high end servers, where extreme fault tolerance is needed - here if you're curious).

For a number of things listed there's motherboard, chipset or BIOS work-arounds, and some items only effect motherboards that do some things in certain ways. In this case it's impossible to know if the work-around has been implemented or not, or if the motherboard is effected by the bug. In these cases I assume the problem is fixed because there's no practical way of detecting otherwise and it's possibly not a good idea to warn users about problems that might not exist.

For how I've been classifying CPU bugs for my OS, it might be worth reading this page in my user manual. For Intel chips from Pentium to Pentium II, I've got 7 classified as "errata" and 8 classified as "flaws", however some of the "flaws" represent several problems that all effect the same CPUs. There's also 33 of them that I've marked as "TO BE REVIEWED" - for these I either don't understand enough about the problem, or I am unsure whether it will effect my OS or not (at least half of these won't be classified until I implement machine check exception handling).

I should also point out that Intel's published errata is very comprehensive - they seem to list everything that could ever effect anything. Because of this I'm quite lenient - I tend to look for reasons why each bug won't matter, rather than looking for reasons why each bug might matter. This is party because I've been using Intel CPU's (with heaps of "bugs") for many years without ever actually having a problem that can be attributed to a CPU bug.

For a public database of bugs, I'd suggest the following fields:
  • Bug Name
    Category
    Effected CPU/s (manufacturer, family, model, stepping, etc)
    Contributing Factors (a short description)
    Bug Description (about 1 paragraph)
    Workaround (about 1 paragraph if a work-around can be implemented by the OS kernel)
For an example:

Bug Name: F00F Bug
Category: SEVERE, FIXABLE
Contributing Factors: LOCK CMPXCHG8B instruction
Bug Description: The LOCK CMPXCHG8B instruction can be used to completely lock up the computer (at any privilege level) due to a CPU bug that leaves the bus locked while trying to invoke an invalid instruction exception.
Workaround: The easiest known work-around is to set the IDT to "write-through" caching (e.g. using the flags in the page table entry). For more information see http://www.x86.org/errata/dec97/f00fbug.htm.
Effected CPUs: All Intel Pentium CPUs.

The problem here is going to be deciding which pieces of errata published by Intel and AMD actually matter, and finding details for other manufacturers....


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply