Detecting Usable Physical Memory

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Detecting Usable Physical Memory

Post by Antti »

This question may seem simple but I think there are a lot of aspects that might turn this into an advanced topic. First of all, there are several ways of approaching this problem and I would like to differentiate between the "practical way" and the "defensive/advanced way," the latter potentially making this a deep-laid question. So the question is: how to detect usable physical memory? At first I would like to base the discuss on "80x86 platform with BIOS firmware" and list some random notes.
  • The most obvious way is to let the firmware (BIOS) list all usable memory ranges. There are a lot of different methods available and using them reliably (work around all known bugs, for example) is going to be difficult. It should be possible (with defensive programming) to get reliable results and sort the memory map in a proper way. Please note that there are many memory types but I simplify this a little bit: all memory that is reported as exlusively "free" (i.e. not overlapping with reserved ranges) is potentially usable physical memory.
  • Well-known reserved memory areas (whether or not reported by BIOS) should be avoided, e.g. from 0xA0000 to 0xFFFFF. Is it possible to have a memory range reported as "free" (not reported does not mean "free") but it still is reserved? The ISA slot at 15-16 MiB is well-known but it should not be reported as "free". Should we trust it is reported correctly?
  • Memory mapped devices (e.g. PCI). This is the biggest unclarity at the moment. Is it sure that memory areas reported as "free" is not used by memory mapped devices? In this case I mean the default resource allocation done by BIOS and not some later remapping.
  • Linear frame buffer for video devices (e.g. VBE2). Is it sure that enabling the linear frame buffer does not change the memory map, i.e. if the memory map is read before setting the video mode?
This is overly simplified but I guess it is enough for the first post. There have been old topics about these issues but I would rather start a fresh one because there are a lot of new members and it should not hurt if this topic were revisited anyway.
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: Detecting Usable Physical Memory

Post by BrightLight »

BIOS function INT 0x15 EAX 0xE820 lists all memory ranges and can detect memory over 4 GB. Most PCI devices will be listed as reserved or ACPI NVS. ACPI tables that can be removed after used are listed as ACPI reclaimable memory. The VBE framebuffer (on my 2 test PCs) are detected simply as "reserved" memory even when they are not enabled. Even if you are in a text mode, BIOS function 0xE820 will show the VBE framebuffer as reserved memory. So, you can simply avoid writing to ACPI NVS, reserved and bad memory. If a well known memory location (eg 0xA0000) is marked as free, just avoid using it. It may be a BIOS bug.
You know your OS is advanced when you stop using the Intel programming guide as a reference.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Detecting Usable Physical Memory

Post by Antti »

ACPI Specification, Revision 5.0 wrote:E820 Assumptions and Limitations
  • The BIOS returns address ranges describing baseboard memory.
  • The BIOS does not return a range description for the memory mapping of PCI devices, ISA Option ROMs, and ISA Plug and Play cards because the OS has mechanisms available to detect them.
  • The BIOS returns chip set-defined address holes that are not being used by devices as reserved.
  • Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
  • All occurrences of the system BIOS are mapped as reserved, including the areas below 1 MB, at 16 MB (if present), and at end of the 4-GB address space.
  • Standard PC address ranges are not reported. For example, video memory at A0000 to BFFFF physical addresses are not described by this function. The range from E0000 to EFFFF is specific to the baseboard and is reported as it applies to that baseboard.
  • All of lower memory is reported as normal memory. The OS must handle standard RAM locations that are reserved for specific uses, such as the interrupt vector table (0:0) and the BIOS data area (40:0).
We can read this in three ways as far as PCI devices are concerned,

a) the range is not reported at all, i.e. it is a hole in the memory map

b) the type "AddressRangePCI" does not exist, so it does not "return a range description" but it could be the type "AddressRangeReserved" that does exist

c) any of the above, but it is never reported as "free"

There may be other ways to read it so this is not a comprehensive list.
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Detecting Usable Physical Memory

Post by embryo2 »

Antti wrote:First of all, there are several ways of approaching this problem and I would like to differentiate between the "practical way" and the "defensive/advanced way,"
For the "practical" way I see the way Windows or Linux take. If a computer doesn't support the way Microsoft wants then it's just a niche hardware and can be ignored.

For the "defensive" there really can be many spears broken and years spent for the "best" implementation. We need to stop somewhere while it's not too late. So, the problem is - where we should stop?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Detecting Usable Physical Memory

Post by Antti »

embryo2 wrote:If a computer doesn't support the way Microsoft wants then it's just a niche hardware and can be ignored.
In general, it could be considered acceptable to cut out support for very rare hardware. The way of doing it is another topic. You could just let the undefined behavior happen or do it in a controlled way. I prefer the latter, so if my boot code detects a very unlikely hardware setup I may just not support it and give an error message. It is very different to relying on a fuzzy "undefined behavior will tell it sooner or later" thing.

Support for not to support is a valuable feature.
intx13
Member
Member
Posts: 112
Joined: Wed Sep 07, 2011 3:34 pm

Re: Detecting Usable Physical Memory

Post by intx13 »

Merging the firmware-provided memory map with well-known regions and the regions reported in the ACPI tables should give you full coverage. If a system has non-DRAM address ranges that don't appear in the firmware-provided map or in the ACPI tables and aren't well-known then you'd need system-specific patches to support it.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Detecting Usable Physical Memory

Post by Antti »

intx13 wrote:Merging the firmware-provided memory map with well-known regions and the regions reported in the ACPI tables should give you full coverage.
What regions there are in the ACPI tables that I should read? I have prepared to read the tables in my boot loader (no AML of course) because I need the IA-PC Boot Architecture Flags for checking e.g. "8042" and "VGA" availability. If we were using the legacy pre-"E820" memory detection methods, we are usually limited to using memory ranges below about 64 MiB. For backward compatibility when using legacy functions, the BIOS functions should report safe memory ranges even if we were running it on modern hardware. This is not necessarily the case with the "E820" because it is modern enough so the firmware might assume the OS does know what it is doing. It seems that I would get a "practically working" system if I just trust the "free" memory ranges provided by the "E820". However, I want to have a slighly more robust system and I already have the boot framework that could easily support some extra verifications. Now the question is what verifications I should do?

It is also possible that memory areas are just corrupted. I will not put in a great effort into supporting a computer with such faults. However, if I happen to find any faults (not actively finding them, e.g. scanning everything), I will refuse to boot and avoid "undefined behavior" as much as possible.
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Detecting Usable Physical Memory

Post by embryo2 »

Antti wrote:Support for not to support is a valuable feature.
Yes. But it's towards the "defensive" way. And the question "where to stop" is still in play here. You can detect unsupported hardware or you can detect hardware failures or you can detect user's misbehavior - all this can be a valuable feature. And the choice is personal, there's no "general" approach.
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
intx13
Member
Member
Posts: 112
Joined: Wed Sep 07, 2011 3:34 pm

Re: Detecting Usable Physical Memory

Post by intx13 »

Antti wrote:
intx13 wrote:Merging the firmware-provided memory map with well-known regions and the regions reported in the ACPI tables should give you full coverage.
What regions there are in the ACPI tables that I should read? I have prepared to read the tables in my boot loader (no AML of course) because I need the IA-PC Boot Architecture Flags for checking e.g. "8042" and "VGA" availability. If we were using the legacy pre-"E820" memory detection methods, we are usually limited to using memory ranges below about 64 MiB. For backward compatibility when using legacy functions, the BIOS functions should report safe memory ranges even if we were running it on modern hardware. This is not necessarily the case with the "E820" because it is modern enough so the firmware might assume the OS does know what it is doing. It seems that I would get a "practically working" system if I just trust the "free" memory ranges provided by the "E820". However, I want to have a slighly more robust system and I already have the boot framework that could easily support some extra verifications. Now the question is what verifications I should do?
The DSDT contains memory (and port) mapping information, but it is in AML. However, I'd be pretty surprised if those regions aren't included in the firmware-provided memory map, since the same devs wrote both.
It is also possible that memory areas are just corrupted. I will not put in a great effort into supporting a computer with such faults. However, if I happen to find any faults (not actively finding them, e.g. scanning everything), I will refuse to boot and avoid "undefined behavior" as much as possible.
I think that the firmware can detect faulty/damaged DRAM chips during the configuration of the DRAM controller and will not map to those chips. Not 100% sure though.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Detecting Usable Physical Memory

Post by Antti »

intx13 wrote:However, I'd be pretty surprised if those regions aren't included in the firmware-provided memory map, since the same devs wrote both.
Hopefully. At least there are a lot of users assuming it is properly implemented. Perhaps mainstream operating systems would had failed to operate at some point if those did not match.
intx13 wrote:I think that the firmware can detect faulty/damaged DRAM chips during the configuration of the DRAM controller and will not map to those chips.
To some extent I would assume it can detect faulty RAM. There is a memory type reserved for it, "AddressRangeUnusuable" (this range of addresses contains memory in which errors have been detected). That memory type has not always been there and I also assume that older implementations are more error-prone when it comes to this.
intx13
Member
Member
Posts: 112
Joined: Wed Sep 07, 2011 3:34 pm

Re: Detecting Usable Physical Memory

Post by intx13 »

Antti wrote: To some extent I would assume it can detect faulty RAM. There is a memory type reserved for it, "AddressRangeUnusuable" (this range of addresses contains memory in which errors have been detected). That memory type has not always been there and I also assume that older implementations are more error-prone when it comes to this.
The DRAM controller might not map to the faulty chips at all, rather than map to them and then mark those physical addresses as unusable. But I guess that depends on the chipset and how fine-grained the address mapping can be configured.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Detecting Usable Physical Memory

Post by Antti »

intx13 wrote:The DRAM controller might not map to the faulty chips at all, rather than map to them and then mark those physical addresses as unusable.
That would be the easiest option and perhaps quite likely too. For operating system developers it may give some valuable information to see if there are faulty chips. In that regard, I hope the memory type "AddressRangeUnusuable" is actually used. For very critical systems it might be acceptable to assume the computer is not reliable enough if it had faulty chips. I have no statistics about how common it actually is to have faulty chips on modern hardware.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Detecting Usable Physical Memory

Post by Brendan »

Hi,
Antti wrote:
intx13 wrote:The DRAM controller might not map to the faulty chips at all, rather than map to them and then mark those physical addresses as unusable.
That would be the easiest option and perhaps quite likely too. For operating system developers it may give some valuable information to see if there are faulty chips. In that regard, I hope the memory type "AddressRangeUnusuable" is actually used. For very critical systems it might be acceptable to assume the computer is not reliable enough if it had faulty chips. I have no statistics about how common it actually is to have faulty chips on modern hardware.
There's 2 different types of failures - persistent failures and transient failures. Persistent failures are relatively rare (and very easy to detect). The half-baked "RAM test" that the firmware does will only detect a small number of persistent failures if/when you're lucky (because a thorough test takes far too long). Transient failures are far more common and are much harder to detect (without ECC).

Several companies have done large scale studies of RAM chip failure rates (mostly using ECC to get corrected and uncorrected error statistics). The most well know study is possibly Google's study, which says "an average of 25,000–75,000 FIT (failures in time per billion hours of operation) per Mbit", which translates to 4.8 to 14.4 failures per day per GiB.

If you assume that RAM without ECC has similar failure rates as ECC RAM (and that instead of corrected and uncorrected errors you just get silent data corruption); that would imply that for a typical modern 80x86 desktop/laptop machine with 8 GiB of RAM you'd be expecting something to get corrupted every 6 minutes (on average). 8)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: Detecting Usable Physical Memory

Post by Octocontrabass »

Brendan wrote:If you assume that RAM without ECC has similar failure rates as ECC RAM (and that instead of corrected and uncorrected errors you just get silent data corruption); that would imply that for a typical modern 80x86 desktop/laptop machine with 8 GiB of RAM you'd be expecting something to get corrupted every 6 minutes (on average). 8)
Statistics don't work that way. If you really could expect something to get corrupted every 6 minutes, everyone would be using ECC RAM. :roll:

If you assume that RAM without ECC has similar failure rates as ECC RAM, you can expect most RAM to perform flawlessly for many years.
ggodw000
Member
Member
Posts: 396
Joined: Wed Nov 18, 2015 3:04 pm
Location: San Jose San Francisco Bay Area
Contact:

Re: Detecting Usable Physical Memory

Post by ggodw000 »

hi, I would like to chip in here. I have a working code that enters the pmode with int enabled.
Now I need to allocate available memory to user code and data which I was planning to determine based on E820.
so I prepared little test code that walks through the e820 but on the first call it returns valid data (looks valid looking at address) but returning Carry. So my code has to exit. I even allowed to continue on 2-3 times after carry to see what happens but subsequent calls are all returning FFh on buffer.

I am wondering what is the issue? Only hint I have is that there seems to be an error code in AX since low word of EAX has changed after first call. I looked around to see the error codes for e820 including the ACPI v4.0 spec, so far no succcess.

Here is the output, can someone tell what it is?
and So far I have tried on 2 different pc:
-asus mother board
-hyper v virtual machine.
Both booting to DOS and calling the e820 program.
Since both of them are returning similar error (Carry after 1st), I can not stay some defective board or BIOS, something must be wrong with my code:

Code: Select all

CODE:

    IF      ENABLE_E820_TEST
	mov		eax, 0e820h			; (EAX) = e820 function specifier.
	sub		ebx, ebx			; (EBX) = 0 first call of E820 must be 0.
	mov		ecx, 24				; (ECX) = size of buffer to fill
	mov		edx, 'SMAP'
	mov		edx, 0534D4150h
	mov		di, DATA
	mov		es, di
	lea		di, DATA:e820Buffer	; (ES:DI) buffer pointer. 	
	lea		si, DATA:e820BufferEnd
e820TestLoop:
	int 	15h	
	jnc		e820TestLab1
	M_PRINTF "\nReturned with carry. Some error encountered. "
	inc		bp
	cmp		bp, 3	
	ja		exit
		
e820TestLab1:
	M_PRINT_R32_NL eax
	M_PRINT_R32_NL ecx
	M_PRINT_R32_NL ebx
	M_PRINTF "\nES:DI: "
	M_PRINT_1616_SPC es, di
	M_PRINTF "\nES:SI: "
	M_PRINT_1616_SPC es, si
	
	M_PRINTF "\nBuffer content: "
	M_PRINTF "\nBaseLow:   "
	M_PRINTDWORD es:[di]
	M_PRINTF "\nBaseHigh:  "
	M_PRINTDWORD es:[di+4]
	M_PRINTF "\nLimitLow:  "
	M_PRINTDWORD es:[di+8]
	M_PRINTF "\nLimitHigh: "
	M_PRINTDWORD es:[di+12]
	M_PRINTF "\nType:      "
	M_PRINTDWORD es:[di+16]
;	M_PRINTSTR_1616_NL es, di, 0, 20
	
	cmp		ebx, 0				; last call if ebx is 0.
	je		exit			
	add		di, 20h				; next pointer in memory. 
	cmp		di, si
	jb		e820TestLoop
    ENDIF   ; ENABLE_E820_TEST


OUTPUT:
eax: 534D4150

ecx: 00000014
ebx: 00000001
ES:DI:  0703:0020
ES:SI:  0703:0278
Buffer content: 
BaseLow:   00000000
BaseHigh:  00000000
LimitLow:  0009FC00
LimitHigh: 00000000
Type:      00000001
Returned with carry. Some error encountered. 
eax: 534D8650
ecx: 00000014
ebx: 00000001
ES:DI:  0703:0040
ES:SI:  0703:0278

Buffer content: 
BaseLow:   FFFFFFFF
BaseHigh:  FFFFFFFF
LimitLow:  FFFFFFFF
LimitHigh: FFFFFFFF
Type:      FFFFFFFF

Returned with carry. Some error encountered. 

eax: 534D0050
ecx: 00000014
ebx: 00000001
ES:DI:  0703:0060
ES:SI:  0703:0278

Buffer content: 
BaseLow:   FFFFFFFF
BaseHigh:  FFFFFFFF
LimitLow:  FFFFFFFF
LimitHigh: FFFFFFFF
Type:      FFFFFFFF

Returned with carry. Some error encountered. 
eax: 534D8650
ecx: 00000014
ebx: 00000001
ES:DI:  0703:0080
ES:SI:  0703:0278

Buffer content: 
BaseLow:   FFFFFFFF
BaseHigh:  FFFFFFFF
LimitLow:  FFFFFFFF
LimitHigh: FFFFFFFF
Type:      FFFFFFFF

Returned with carry. Some error encountered. 
Thanks.,
key takeaway after spending yrs on sw industry: big issue small because everyone jumps on it and fixes it. small issue is big since everyone ignores and it causes catastrophy later. #devilisinthedetails
Post Reply