CPU Errata Resource?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
quok
Member
Member
Posts: 490
Joined: Wed Oct 18, 2006 10:43 pm
Location: Kansas City, KS, USA

CPU Errata Resource?

Post by quok »

I'm hoping someone here could point me to a single resource that lists all known CPU errata. I'd rather not have to go digging through OS source and tons of different websites or PDFs looking for these things. Perhaps if something like this doesn't exist, it'd be a good thing to get on the wiki. I'm mostly interested in x86 and x86-64 at this point, but going forward other architecture's errata would be greatly appreciated as well.
ru2aqare
Member
Member
Posts: 342
Joined: Fri Jul 11, 2008 5:15 am
Location: Hungary

Re: CPU Errata Resource?

Post by ru2aqare »

If I remember correctly, Ralf Brown's Interrupt List package used to have one. But it may be pretty outdated (only includes bugs up to the Pentium).
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Re: CPU Errata Resource?

Post by Brynet-Inc »

After discussing it with quok, here is what we have..

ftp://download.intel.com/design/processor/specupdt/
ftp://download.intel.com/design/mobile/SPECUPDT/

AMD doesn't have a public FTP server AFAIK, but their processor errata documents appear to be called "Revision Guides".

"Revision Guide for AMD Athlon(tm) 64 and AMD Opteron(tm) Processors"
http://www.amd.com/us-en/assets/content ... /25759.pdf

"Revision Guide for AMD Family 10h Processors"
http://www.amd.com/us-en/assets/content ... /41322.pdf

Why hasn't anyone tried to organize this better for errata tracking? :|
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: CPU Errata Resource?

Post by bewing »

Probably because there is just too damned much of it, and it changes too fast.
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Re: CPU Errata Resource?

Post by Brynet-Inc »

bewing wrote:Probably because there is just too damned much of it, and it changes too fast.
Still, as developers, we should be tracking these changes.. perhaps warning users of known problems.

OpenBSD currently does this, if a known errata is detected.. they recommend a BIOS update.
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
quok
Member
Member
Posts: 490
Joined: Wed Oct 18, 2006 10:43 pm
Location: Kansas City, KS, USA

Re: CPU Errata Resource?

Post by quok »

Brynet-Inc wrote:
bewing wrote:Probably because there is just too damned much of it, and it changes too fast.
Still, as developers, we should be tracking these changes.. perhaps warning users of known problems.

OpenBSD currently does this, if a known errata is detected.. they recommend a BIOS update.
I agree, I think it should be tracked. There is a lot of it, yes, and it does change very fast. However manufacturers also pull documents from their websites for older processors as new ones come out. I think a central place to find this information would be worth the hassle, especially as I'm sure I'm not the only one that's going to be implementing the proper workarounds and such in their OS.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: CPU Errata Resource?

Post by Combuster »

There was a page on this in the wiki: CPU Bugs - maybe an idea to update it?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: CPU Errata Resource?

Post by Brendan »

Hi,
quok wrote:I'm hoping someone here could point me to a single resource that lists all known CPU errata. I'd rather not have to go digging through OS source and tons of different websites or PDFs looking for these things. Perhaps if something like this doesn't exist, it'd be a good thing to get on the wiki. I'm mostly interested in x86 and x86-64 at this point, but going forward other architecture's errata would be greatly appreciated as well.
For Intel and AMD, the "specification updates" or "revision guides" are on their web sites. IMHO the fastest way to find them is a Google search like this "specification update site:intel.com". For both companies, errata for old CPUs may still be on the company's web site but not listed like recent CPUs (e.g. it could be in a "archived" area). Old CPUs might also be renamed - I remember finding 80486 errata in Intel's "embedded CPU" section about a year ago, as it'd been "end of lifed" for desktop use but was still being sold for embedded systems until a few years ago.

For Intel and AMD, if someone discovers a new problem then the corresponding errata is updated, so (if you're seriously vigilant) you'd need to double check everything every few months. Of course errata for new CPUs is more likely to be updated than errata for older CPUs. In your source code, I'd recommend keeping track of the version or date of the errata document the code was derived from.

For other 80x86 CPU manufacturers (IBM, Cyrix/VIA, NSC, Centaur, IDT, SiS, NexGen, Rise, Transmeta, UMC, STM, Texas Instruments, ZF) you'd first want to work out which CPUs you support. For example, if your OS requires MMX or something then you can forget most of these. In any case, good errata is hard to find (any documentation for any old CPU is hard to find, especially if the company is no longer in business). The only advice I can give here is to find what you can while you can and archive it. Sandpile.org is a good place to start, as they list all the documentation - even though you can't download it from their private links you can find the document titles and file names, and use this information to improve your web searches.

Also note that some of these "old" CPUs seem like they're discontinued, but aren't. For example, about a month ago I bought a new computer (a tiny diskless thing, that's entirely "PC compatible" and came with USB, video, ethernet, etc all built-in, including PXE support). This computer came with a "Vortex86" CPU - an "80486 compatible" CPU made by SiS that's normally used for embedded systems.

The information you get from most web sites (including the OSdev wiki) is normally "minimal" - usually only significant problems are mentioned (e.g. Cyrix Coma bug, Intel Pentium F00F bug) and a huge number of other problems aren't mentioned (if you've seen the errata for any modern/recent CPU you'll know what I mean).

IMHO it's important to consider what you're going to do with the errata information, and to deal with this information in a responsible way. Too many times I've seen average users on news sites and forums in a panic (saying how broken CPUs are, advocating boycotts, etc), because they lack the knowledge needed to understand the errata information provided by the CPU manufacturer and how it effects (or more commonly, doesn't effect) real systems. Worse, sometimes some useless journalist will sensationalize a recently released piece of errata. It's all bad, because with enough bad publicity the CPU manufacturers will be very tempted to restrict access to their errata information (e.g. make people sign an NDA before they can see the errata, or perhaps only give the errata information to chipset/motherboard manufacturers and Microsoft). I hope you can understand why all OS developers would like to avoid that... ;)

If you carefully look at the full list of errata for a CPU you'll notice that some of it is irrelevant. For example, it might involve something that isn't used (e.g. FRC mode, where the pins on one CPU are used to monitor the signals on a second CPU) or it might involve software that does something that the programming manuals say causes undefined behavior. Some of the errata might not effect your OS anyway (for e.g. my OS will never support Virtual86 mode, so any bugs involving Virtual86 mode will never effect my OS). Some of the errata can be easily fixed by the OS (e.g. dodgy CPUID return values is common, but is easily fixed by the OS because all software should use the OS's "CPU data" rather than using CPUID directly). In all of these cases the OS should be silent (don't mention the errata to any user).

Some of the errata can be be fixed by a work-around in the kernel, and won't effect normal software (applications, etc). For these I'd have a set of flags to keep track of them (so the kernel knows if the work-around/s are needed or not). The user doesn't need to know about these either.

Some of the errata is for chipset/motherboard designers to worry about, and an OS can't assume the problem effects the chipset/motherboard. IMHO the user doesn't need to know about these problems either (unless the OS knows for sure that the chipset/motherboard is effected).

Some of the errata may cause problems (typically only in rare circumstances) for normal users. Most normal users don't really need to know about these problems either, but the OS should make this information available to system administrators so they know the computer is effected by the problem. If the OS does make the information available, then it should also provide an adequate explanation of the problem (including the chance of the errata causing any problem) and advice for the system administrators.

Also, if you're going to be thorough, the first thing I'd recommend is having a "unknown" flag, so that if there isn't enough good information (or if you haven't had time to examine each piece of errata yet) you can set the "unknown" flag to let people know that there may or may not be errata that the OS hasn't detected.

Finally, for Intel CPUs some of the errata is fixed by CPU microcode updates; but there's no way to determine which problems are fixed by which microcode updates. In this case (for example) your OS might tell system administrators about a problem that might or might not have been fixed, that might or might not effect the reliability of software they use. In this case it is possible to detect if any microcode updates have been installed, and to use this to improve the information the OS tells system administrators. For example, if there's no microcode update installed, then the OS could tell system administrators there definitely is a problem and recommend a microcode update; but if there is some sort of microcode update installed, then the OS could stay silent or perhaps mention that the problem may or may not exist.

For AMD CPUs, I'd recommend reading Chapter 17, "OS-Visible Workaround Information" from a recent copy of "AMD64 Architecture Programmer’s Manual Volume 2: System Programming". You'll find these "OS-visible workaround/s" listed in AMD's revision guides. It complicates things a little for the OS, but should also be useful because it lets the OS know if the problem is fixed by hardware or not (I wish Intel would implement this feature).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply