CPU bugs database?
CPU bugs database?
This comes from the "invalidate TLBs in the 386" thread, where I learned that CPU detection code is far complex than I thought because the only thing to watch out for is not the lack of certain instructions in older processors, but also some quite nasty bugs in some CPUs.
And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)
If the answer is "no", then can we add this seemingly "common knownledge" (at least between experienced OSdevers) to the wiki?
Thank u all ppl. You are great!
And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)
If the answer is "no", then can we add this seemingly "common knownledge" (at least between experienced OSdevers) to the wiki?
Thank u all ppl. You are great!
Re:CPU bugs database?
A good place to start would be the errata or processor manuals sections at http://www.x86.org.
Re:CPU bugs database?
Hi,
Depending on what sort of OS you want it may be enough to ignore some of the things I listed - for e.g. ignoring most CPU bugs would save a lot of hassle...
I spent ages googling (mostly unsuccessfully) for documentation for some CPUs (VIA, Rise, Cyrix, Centaur, Transmeta), and then even more time going through the documentation I did find (mostly published Intel and AMD errata, which can be found easily enough from these manufacturer's web sites).
My own code is also online:
Cheers,
Brendan
Hehee - no, I'm not trying to scare you. Just thought you mind be interested in how messy it can get. I should point out that my own code is intended to be "as good as possible", which means it's meant to do things that both Windows and Linux don't.Habbit wrote:You're just trying to scare me, aren't you? Half a MiB just to reliably detect which instructions won't send the system to the dogs??!! Dammit, that would mean that code will be the BIGGEST part of my kernel, well above other things that I considered "more complex", such as the scheduler and the VMM!!
Depending on what sort of OS you want it may be enough to ignore some of the things I listed - for e.g. ignoring most CPU bugs would save a lot of hassle...
The simple answer is "not that I'm aware of". There is one place with some details on the most severe Intel CPU bugs (here), but nothing much else except from manufacturer's data sheets, errata, etc.Habbit wrote:And the Oscar... er... the question is: Is there any "centralized" DB of "all" the bugs present in large numbers of processors of the x86 architecture? (I refer to things like the infamous Pentium FPU bug, the P2 0xF00F bug or the K6-2 super-fast-LOOP "problem", severe problems that affect a wide number of processors)
I spent ages googling (mostly unsuccessfully) for documentation for some CPUs (VIA, Rise, Cyrix, Centaur, Transmeta), and then even more time going through the documentation I did find (mostly published Intel and AMD errata, which can be found easily enough from these manufacturer's web sites).
My own code is also online:
- - initial CPU identification
- cache size detection
- CPU feature detection
- Intel CPU errata
- CPU errata for other manufacturers
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:CPU bugs database?
Well some things are common knowledge, like F00F, FDIV and FISTP, a lot of others are not, and i have to agree that it might be a good thing to summarize the known and less known things in a wiki page for quick reference.
Brendan: I have Cyrix docs (Cyrix 6x86 to be precise) - if people are interested i can upload these.
I believe the VIA processor is a variation to the MediaGX, designed by cyrix, bought and improved by amd, sold to VIA but non intel/amd docs are hard to find anyway. As for the rest, not a clue
If more people think a wiki page would be helpful, i'll be happy to document what I found over time, although i only know the common intel issues, the coma bug, and the problems with overlapping cpuids.
Apart from that, i dont think there is a site which summarizes bugs of more than one processor (except for the stubs on wikipedia), so that'd make it a good idea. Besides a lot of people do not realize that processors are built by humans and thus fundamentally flawed so to speak...
Well, that makes two people in favour. Who votes next?
Brendan: I have Cyrix docs (Cyrix 6x86 to be precise) - if people are interested i can upload these.
I believe the VIA processor is a variation to the MediaGX, designed by cyrix, bought and improved by amd, sold to VIA but non intel/amd docs are hard to find anyway. As for the rest, not a clue
If more people think a wiki page would be helpful, i'll be happy to document what I found over time, although i only know the common intel issues, the coma bug, and the problems with overlapping cpuids.
Apart from that, i dont think there is a site which summarizes bugs of more than one processor (except for the stubs on wikipedia), so that'd make it a good idea. Besides a lot of people do not realize that processors are built by humans and thus fundamentally flawed so to speak...
Well, that makes two people in favour. Who votes next?
Re:CPU bugs database?
You can usually assume my vote for anything of this nature will be positive, I'm the sort who obssessively tries to collect things together, make lists, etc.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:CPU bugs database?
http://www.mega-tokyo.com/osfaq/CpuBugs
Hope you like the title. It is far from complete, but for wiki's sake, go ahead and add your thing. When i get home I'll dig up the cyrix docs and generate some sample code for the coma bug, and perhaps document some other intel things i'm familiar with.
Enjoy
Hope you like the title. It is far from complete, but for wiki's sake, go ahead and add your thing. When i get home I'll dig up the cyrix docs and generate some sample code for the coma bug, and perhaps document some other intel things i'm familiar with.
Enjoy
Re:CPU bugs database?
I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample), explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
Or something similar, at least, so that you can quickly disqualify the option for most processors. You should in any case be able to tell whether it might apply to your target processor group.brand != intel => no
proctype != Pentium P5 => no
procspeed > 66 => no
yes
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re:CPU bugs database?
i.e. you'd like to see some sample code reproducing the bug in all circumstances, as well as problem-solving code?Candy wrote: I'm OK with this, but it'd be quite useful to mark the entries with what exactly makes them go bad (use an assembly sample, even for generic cases just a sample),
I hope the current approach suffices., explain why they go bad (which the two current entries have) and explicitly indicate which processors are affected. I'd vote for doing this with a quick-excluding flowchart-type:
Btw, i was wondering if it was useful to add some sort of rating to these bugs or to have the reader decide on that.
Other remarks are welcome, i'll try and make these two in good condition before documenting some other things to save myself from rewrites
Re:CPU bugs database?
It's probably a good idea to add a form of rating indicating the severeness of the bug, IE, in how far can you create it without explicit trouble. Suggestion for the rating categories:
1. Bugs that cause generic problems or that fail any program unpredictably. Example is FDIV, which doesn't show a lot but does show on all possible FDIVs, even in compiled code.
2. Bugs that cause generic problems that can appear in compiled code, but only if the user intentionally uses assembly code to this extent. (COMA bug)
3. Bugs that cause problems in hand-crafted assembly code that's pretty impossible to create with a compiler (F00F bug)
4. Bugs that cause problems only in privileged mode, which would otherwise fall under any of the above categories
5. Bugs with less impact.
I've thought about occurrence information but it's pretty irrelevant, especially since each occurrence that fails fails the program (and/or the OS).
Nice website with some X86 info is on www.x86.org. F00F bug at http://www.x86.org/errata/dec97/f00fbug.htm
1. Bugs that cause generic problems or that fail any program unpredictably. Example is FDIV, which doesn't show a lot but does show on all possible FDIVs, even in compiled code.
2. Bugs that cause generic problems that can appear in compiled code, but only if the user intentionally uses assembly code to this extent. (COMA bug)
3. Bugs that cause problems in hand-crafted assembly code that's pretty impossible to create with a compiler (F00F bug)
4. Bugs that cause problems only in privileged mode, which would otherwise fall under any of the above categories
5. Bugs with less impact.
I've thought about occurrence information but it's pretty irrelevant, especially since each occurrence that fails fails the program (and/or the OS).
Nice website with some X86 info is on www.x86.org. F00F bug at http://www.x86.org/errata/dec97/f00fbug.htm
Re:CPU bugs database?
86BUGS.LST in INTER61D.ZIP from here has a list of some bugs and quirks but many are irrelevant, pertaining to CPUs like 186s. For people targetting the 386 as the minimum required x86 CPU, there is information about CMPXCHG:
Of course that's not the only relevant thing listed in the file.On the A-step of the 486, this Mnemonic was coded using the opcodes for the, discarded, A- to B0-step 386 instructions XBTS (a6) and IBTS (a7). Because of software conflicts with software written for the early 386 DX the opcodes for the 486 were changed to the ones above starting with the B step.
Re:CPU bugs database?
Not sure that's exactly what we want.(C) Copyright 1993, 1994 By Harald Feldmann Revision 04, Nov 3rd 1994.
Re:CPU bugs database?
Hi,
Some notes....
Every 80x86 CPU ever made has bugs. For Intel CPUs ranging from Pentium to the latest "Core" CPUs there's between 70 to 190 different bugs for each CPU, listed in the corresponding errata (and published by Intel). I assume AMD CPUs aren't much different.
For the errata lists, some of the things Intel include are insane. One example would be"Item No 8: Mode C paging in SMM causes use of incorrect page tables" (Pentium Pro), which doesn't make much sense because the manual explicitly states that paging isn't supported in SMM mode on any CPU.
Some of the things listed can't effect any OS. An example would be"Item No 43: L2 cache may incorrectly report BIST failure" (Pentium II). In this case the Built In Self Test would only occur during boot before an OS is started, so the problem can't effect an OS after it has started.
There's usually quite a few problems with FRC mode (Functional Redundancy Checking). The idea here is that a pair of CPUs are glued together and one CPU is used to check the output the other. If the output of both CPUs doesn't match an FRCERR is signalled to external circuitry. I've never seen or heard of this feature actually being used - it might only be used by Intel for quality control (although I have heard of similar arrangements for non-Intel high end servers, where extreme fault tolerance is needed - here if you're curious).
For a number of things listed there's motherboard, chipset or BIOS work-arounds, and some items only effect motherboards that do some things in certain ways. In this case it's impossible to know if the work-around has been implemented or not, or if the motherboard is effected by the bug. In these cases I assume the problem is fixed because there's no practical way of detecting otherwise and it's possibly not a good idea to warn users about problems that might not exist.
For how I've been classifying CPU bugs for my OS, it might be worth reading this page in my user manual. For Intel chips from Pentium to Pentium II, I've got 7 classified as "errata" and 8 classified as "flaws", however some of the "flaws" represent several problems that all effect the same CPUs. There's also 33 of them that I've marked as "TO BE REVIEWED" - for these I either don't understand enough about the problem, or I am unsure whether it will effect my OS or not (at least half of these won't be classified until I implement machine check exception handling).
I should also point out that Intel's published errata is very comprehensive - they seem to list everything that could ever effect anything. Because of this I'm quite lenient - I tend to look for reasons why each bug won't matter, rather than looking for reasons why each bug might matter. This is party because I've been using Intel CPU's (with heaps of "bugs") for many years without ever actually having a problem that can be attributed to a CPU bug.
For a public database of bugs, I'd suggest the following fields:
Bug Name: F00F Bug
Category: SEVERE, FIXABLE
Contributing Factors: LOCK CMPXCHG8B instruction
Bug Description: The LOCK CMPXCHG8B instruction can be used to completely lock up the computer (at any privilege level) due to a CPU bug that leaves the bus locked while trying to invoke an invalid instruction exception.
Workaround: The easiest known work-around is to set the IDT to "write-through" caching (e.g. using the flags in the page table entry). For more information see http://www.x86.org/errata/dec97/f00fbug.htm.
Effected CPUs: All Intel Pentium CPUs.
The problem here is going to be deciding which pieces of errata published by Intel and AMD actually matter, and finding details for other manufacturers....
Cheers,
Brendan
Some notes....
Every 80x86 CPU ever made has bugs. For Intel CPUs ranging from Pentium to the latest "Core" CPUs there's between 70 to 190 different bugs for each CPU, listed in the corresponding errata (and published by Intel). I assume AMD CPUs aren't much different.
For the errata lists, some of the things Intel include are insane. One example would be"Item No 8: Mode C paging in SMM causes use of incorrect page tables" (Pentium Pro), which doesn't make much sense because the manual explicitly states that paging isn't supported in SMM mode on any CPU.
Some of the things listed can't effect any OS. An example would be"Item No 43: L2 cache may incorrectly report BIST failure" (Pentium II). In this case the Built In Self Test would only occur during boot before an OS is started, so the problem can't effect an OS after it has started.
There's usually quite a few problems with FRC mode (Functional Redundancy Checking). The idea here is that a pair of CPUs are glued together and one CPU is used to check the output the other. If the output of both CPUs doesn't match an FRCERR is signalled to external circuitry. I've never seen or heard of this feature actually being used - it might only be used by Intel for quality control (although I have heard of similar arrangements for non-Intel high end servers, where extreme fault tolerance is needed - here if you're curious).
For a number of things listed there's motherboard, chipset or BIOS work-arounds, and some items only effect motherboards that do some things in certain ways. In this case it's impossible to know if the work-around has been implemented or not, or if the motherboard is effected by the bug. In these cases I assume the problem is fixed because there's no practical way of detecting otherwise and it's possibly not a good idea to warn users about problems that might not exist.
For how I've been classifying CPU bugs for my OS, it might be worth reading this page in my user manual. For Intel chips from Pentium to Pentium II, I've got 7 classified as "errata" and 8 classified as "flaws", however some of the "flaws" represent several problems that all effect the same CPUs. There's also 33 of them that I've marked as "TO BE REVIEWED" - for these I either don't understand enough about the problem, or I am unsure whether it will effect my OS or not (at least half of these won't be classified until I implement machine check exception handling).
I should also point out that Intel's published errata is very comprehensive - they seem to list everything that could ever effect anything. Because of this I'm quite lenient - I tend to look for reasons why each bug won't matter, rather than looking for reasons why each bug might matter. This is party because I've been using Intel CPU's (with heaps of "bugs") for many years without ever actually having a problem that can be attributed to a CPU bug.
For a public database of bugs, I'd suggest the following fields:
- Bug Name
Category
Effected CPU/s (manufacturer, family, model, stepping, etc)
Contributing Factors (a short description)
Bug Description (about 1 paragraph)
Workaround (about 1 paragraph if a work-around can be implemented by the OS kernel)
Bug Name: F00F Bug
Category: SEVERE, FIXABLE
Contributing Factors: LOCK CMPXCHG8B instruction
Bug Description: The LOCK CMPXCHG8B instruction can be used to completely lock up the computer (at any privilege level) due to a CPU bug that leaves the bus locked while trying to invoke an invalid instruction exception.
Workaround: The easiest known work-around is to set the IDT to "write-through" caching (e.g. using the flags in the page table entry). For more information see http://www.x86.org/errata/dec97/f00fbug.htm.
Effected CPUs: All Intel Pentium CPUs.
The problem here is going to be deciding which pieces of errata published by Intel and AMD actually matter, and finding details for other manufacturers....
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.