Physical Memory - Proper Maximum Limit

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
azblue
Member
Member
Posts: 147
Joined: Sat Feb 27, 2010 8:55 pm

Physical Memory - Proper Maximum Limit

Post by azblue »

I was reading this article and a couple quotes caught my eye:
The maximum 2TB limit of 64-bit Windows Server 2008 Datacenter doesn’t come from any implementation or hardware limitation, but Microsoft will only support configurations they can test.
My first question, and the one referenced in the thread's title, is: Is this reasonable? I understand things get a little weird in the 3GB-4GB range, but beyond that is there any reason not to support as much physical memory as the CPU will allow?
...the Windows team started broadly testing Windows XP on systems with more than 4GB of memory. Windows XP SP2 also enabled Physical Address Extensions (PAE) support by default on hardware that implements no-execute memory because its required for Data Execution Prevention (DEP), but that also enables support for more than 4GB of memory.

What they found was that many of the systems would crash, hang, or become unbootable because some device drivers, commonly those for video and audio devices that are found typically on clients but not servers, were not programmed to expect physical addresses larger than 4GB. As a result, the drivers truncated such addresses, resulting in memory corruptions and corruption side effects.
I'm really lost here. Doesn't the OS get to dictate where in the address space devices are mapped to? If it maps them to >4GB, even drivers only keeping the lower 32 bits of an address should be fine, right?
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Physical Memory - Proper Maximum Limit

Post by Geri »

1. once this is a licensing limitation, so if you need windows for your very own supercomuputer, lets pay a few 100k usd for microsoft for it.

2. second, there is no performance in currenct cpus with todays algorythms to effectively find holes in multiple terabytes of ram. you can read RAM with 10-30 gbyte/sec on a thread with cpu. this is not an issue, as with ram size incrases, cpu speeds are also incrasing - but its an issue with current designs.
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
Octocontrabass
Member
Member
Posts: 5486
Joined: Mon Mar 25, 2013 7:01 pm

Re: Physical Memory - Proper Maximum Limit

Post by Octocontrabass »

azblue wrote:My first question, and the one referenced in the thread's title, is: Is this reasonable? I understand things get a little weird in the 3GB-4GB range, but beyond that is there any reason not to support as much physical memory as the CPU will allow?
Microsoft has to support the OS. It's reasonable to expect that they would like to be able to test it themselves before claiming that it will work, especially when third-party drivers are involved.
azblue wrote:I'm really lost here. Doesn't the OS get to dictate where in the address space devices are mapped to? If it maps them to >4GB, even drivers only keeping the lower 32 bits of an address should be fine, right?
The problem is that drivers need to use physical addresses for things like DMA. When the RAM being used for DMA comes from a physical address above 4GB but the driver tells the hardware to use a physical address below 4GB, bad things happen.
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: Physical Memory - Proper Maximum Limit

Post by BrightLight »

azblue wrote:I'm really lost here. Doesn't the OS get to dictate where in the address space devices are mapped to? If it maps them to >4GB, even drivers only keeping the lower 32 bits of an address should be fine, right?
The device drivers mainly communicate with the device itself using memory-mapped I/O and DMA, both of which use physical addresses, not linear addresses. For PAE, the linear addresses are indeed 32-bits only, but the physical addresses are normally 36-bits by default. So even if the memory is mapped at, for example, 3 GB, or any address in the 32-bit linear address space, the actual physical address of that page may be 6 GB, or any other address in the 36-bit physical address space. A driver that is not aware of the existance of physical addresses larger than 32 bits may truncate a 36-bit address, taking the lowest 32-bits only. An example buffer at physical address 4 GB (0x100000000) would be sent to the device by a driver that is not aware of PAE as address zero. My instincts tell me that the device would not work properly that way. ;)

In general, PAE caused compatibility issues with existing software as you have read, and thus to maintain compatibility with existing software, IMHO 32-bit kernels should stick to i386 paging, while 64-bit kernels are the ones that should make use of the entire available RAM as well as the CPU's address bus length. You can use CPUID to detect how many bits wide is the CPU's address bus.

Another workaround that allows you to use PAE yet avoid problems of physical addresses larger than 32-bits with devices that only support 32-bit addressing is to provide functions that let device drivers allocate memory using specific attributes, i.e. instead of a traditional malloc, you can implement a similar function that allows device drivers to tell the kernel things like "give me 32 MB of uncacheable contiguous physical memory in the range of 2 GB to 3 GB, and it must be aligned on a 64-byte boundary" and other similar functionality.

I'm open to other suggestions, if people see flaws or have better solutions.
You know your OS is advanced when you stop using the Intel programming guide as a reference.
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Physical Memory - Proper Maximum Limit

Post by simeonz »

AFAIK, Windows used to have first-fit physical pages allocator implemented with a memory bitmap. While this is feasible with the officially supported amounts of RAM and modern processors, it is not very scalable. As a first-fit allocator, it can at least gracefully recover from fragmentation over time by pushing free memory to the end of the address space (, which increases the probability of contiguous free chunks).

Edit: Most sources indicate that Windows uses bitmaps for its virtual memory management, not its physical memory management. Although physical memory could be occupied by process pages and the non-paged pool, if RAM is significantly over-provisioned it could still be used for filesystem caching. Considering that Windows keeps free and zeroed pages in pfn lists, and that physical memory fragmentation is not that much of an issue for Windows specifically, since it doesn't use direct mapping like Linux, my original statement above could be assumed incorrect.

I am not sure if physical memory had similar issues, but the Windows virtual memory layout was at one point limited due to bit-packing in pointers, in connection with lock-free algorithms. May be particular drivers were doing something similar with physical memory descriptors, difficult to imagine as it may be.
Last edited by simeonz on Mon Jun 26, 2017 3:20 am, edited 3 times in total.
User avatar
~
Member
Member
Posts: 1225
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Physical Memory - Proper Maximum Limit

Post by ~ »

Many haven't probably thought about what I will say, except for those few who have actually managed to implement proper paged memory managers:

Paging is greatly intended to prevent memory fragmentation after many loaded programs, memory allocations and deallocations, block resizes, etc... You might think that paging will help you with physical fragmentation, and it will, but then you will get fragmentation inside the virtual address space, and you need an increasingly good memory managemet implementation, strong enough to look ahead of time how to lay and treat memory blocks to deal with memory fragmentation.

Higher-level languages might take care of that with their library functions and specifications, but languages like C and assembly under an OS would reasonably be thought to need to make use of different algorithms to link the chunks of data of one same thing to take care of memory fragmentation at the application level, and the kernel would need to do the same, so that (managing memory fragmentation) is a persistent task to be done at all levels, and will always be in a rigid memory system, in other words, all currently existing digital electronics systems.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Physical Memory - Proper Maximum Limit

Post by bluemoon »

You need a structure to hold the list of free pages. Even with simplest "page stack", ie. 8 bytes per page, you need 2TB/4k*8 = 4GB to hold the page stack. This pose a restriction on which logical address region you allocating for this data structure, and for most of us just put everything on last 2GB zone this surely breaks.
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Physical Memory - Proper Maximum Limit

Post by simeonz »

~ wrote:Paging is greatly intended to prevent memory fragmentation after many loaded programs, memory allocations and deallocations, block resizes, etc... You might think that paging will help you with physical fragmentation, and it will, but then you will get fragmentation inside the virtual address space, and you need an increasingly good memory managemet implementation, strong enough to look ahead of time how to lay and treat memory blocks to deal with memory fragmentation.
I am not even sure that there is real benefit to keeping data contiguous in physical memory, aside from using large pages if possible, to reduce TLB thrashing. This even made me double-check my previous post where I made a statement that Windows uses first-fit physical memory allocator, but this makes no sense, because the memory is mapped on demand on Windows. In contrast, Linux uses direct mapping of the physical memory, which makes virtual and physical fragmentation incident. After enough I/O, which is cached on page granularity, the primary kernel allocator gets fragmented. And even though the buddy system helps a bit, big multi-page requests are prone to failure. This is one of the reasons why Linux thread stacks are small - making them big would be hazard for thread creation. What I am trying to say is that indeed virtual memory fragmentation can have important design implications.
User avatar
zaval
Member
Member
Posts: 653
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: Physical Memory - Proper Maximum Limit

Post by zaval »

simeonz wrote: Edit: Most sources indicate that Windows uses bitmaps for its virtual memory management, not its physical memory management. Although physical memory could be occupied by process pages and the non-paged pool, if RAM is significantly over-provisioned it could still be used for filesystem caching. Considering that Windows keeps free and zeroed pages in pfn lists, and that physical memory fragmentation is not that much of an issue for Windows specifically, since it doesn't use direct mapping like Linux, my original statement above could be assumed incorrect.
I am a plain dumb yet regarding MM, but I am not sure Windows uses bitmaps for VM allocations.
Windows Internals, Chapter 7: Memory Management 449 wrote: With the lazy-evaluation algorithm, allocating even large blocks of memory is a fast operation.
When a thread allocates memory, the memory manager must respond with a range of
addresses for the thread to use. To do this, the memory manager maintains another set of data
structures to keep track of which virtual addresses have been reserved in the process’s address
space and which have not. These data structures are known as virtual address descriptors
(VADs). For each process, the memory manager maintains a set of VADs that describes the status
of the process’s address space. VADs are structured as a self-balancing binary tree to make
lookups efficient. Windows Server 2003 implements an AVL tree algorithm (named after their
inventors, Adelson-Velskii and Landis) that better balances the VAD tree, resulting in, on average,
fewer comparisons when searching for a VAD corresponding with a virtual address. A diagram
of a VAD tree is shown in Figure 7-28.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Physical Memory - Proper Maximum Limit

Post by LtG »

As others have said, MS actually has to _support_ their customers (to varying degrees). If they don't have access to a device with more than 2TB RAM, then they couldn't even have tested their software on one single machine. The reality is that you will quite likely introduce limits into your code (as people have been doing by using uint32_t, as they did when they used two digits to encode the year, causing the y2k, as they did with the Linux epoch stuff = y2k38).

So I would say that the only sane thing is to actually test it at least a little, and getting 2TB machines is probably a bit difficult, not sure what kind of servers are available these days.

If drivers lived fully in virtual address space, and thus had to request DMA (and other stuff) from Windows then this wouldn't be much of an issue driver wise. However if drivers deal with the "raw" physical addresses and they internally use uint32_t (or any type of pointer but the code is compiled as 32-bit, so the pointers are 32-bit), then they will overflow with larger addresses and of course cause weird stuff until they crash.



As to omarrx024 asking for other suggestions, not sure, but possibly utilizing IOMMU might be an alternative. I don't know nearly enough of Windows's internals to provide anything more useful =)
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Physical Memory - Proper Maximum Limit

Post by simeonz »

zaval wrote:I am a plain dumb yet regarding MM, but I am not sure Windows uses bitmaps for VM allocations.
I meant that bitmaps are used in kernel virtual memory management, so I wasn't precise. You are right - VAD's are used for user-space virtual memory management. The bitmaps for the non-paged and paged pool allocators are mentioned on CodeMachine. (You can search for "bitmap".) There is also mention in this MS paper. (Search for "bitmap" again if interested.)
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Physical Memory - Proper Maximum Limit

Post by LtG »

Geri wrote:there is no performance in currenct cpus with todays algorythms to effectively find holes in multiple terabytes of ram. you can read RAM with 10-30 gbyte/sec on a thread with cpu. this is not an issue, as with ram size incrases, cpu speeds are also incrasing - but its an issue with current designs.
Find what holes? Why do you want to find holes?

You can "easily" read RAM at 120GB/s (four NUMA domains). Not sure how threads are involved here..?

Also I thought CPU wouldn't be increasing in speed, or rather they haven't in terms of Hz for quite some time. Some performance increases will still come but the expectation is that the future performance increases will be largely thru more cores, not single core performance. Though even that's somewhat irrelevant, it's not the CPU speed that matters, it's the RAM speed that's the bottleneck.
User avatar
zaval
Member
Member
Posts: 653
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: Physical Memory - Proper Maximum Limit

Post by zaval »

simeonz wrote: I meant that bitmaps are used in kernel virtual memory management, so I wasn't precise. You are right - VAD's are used for user-space virtual memory management. The bitmaps for the non-paged and paged pool allocators are mentioned on CodeMachine. (You can search for "bitmap".) There is also mention in this MS paper. (Search for "bitmap" again if interested.)
ah. thanks for the links. it's interesting.

I just remember, that I read in the WDK documentation that for the pools, the buddy mechanism is used. Yes, I found it, it's in the WDK Glossary->pool memory.
The memory manager allocates entities from both pools using a buddy scheme.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Physical Memory - Proper Maximum Limit

Post by Geri »

LtG wrote: You can "easily" read RAM at 120GB/s (four NUMA domains). Not sure how threads are involved here..?
you are wrong, and you also acknowledge it when you used the ''-s.
you will never get more than a few gbytes per sec in a real algorithm per threads, and thats a fact and not a question of oppinion
LtG wrote: Also I thought CPU wouldn't be increasing in speed, or rather they haven't in terms of Hz for quite some time.
nowdays we having superscalar cpus with the capability to execute multiple opcodes under a clock cycle. 20 years before the cpus were able to do 1.5 opcode per cycle on average. 10 years before we had 2.5 opcode, now we are having 4-5 and the theocretical limit of current architectures are 3 for arm, 6 for amd and 8 on intel as we have as many execution units per core (altrough some of them are only capable to do a specific thing so we will never fully reach this numbers in reality).
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Physical Memory - Proper Maximum Limit

Post by LtG »

Geri wrote:
LtG wrote: You can "easily" read RAM at 120GB/s (four NUMA domains). Not sure how threads are involved here..?
you are wrong, and you also acknowledge it when you used the ''-s.
you will never get more than a few gbytes per sec in a real algorithm per threads, and thats a fact and not a question of oppinion
LtG wrote: Also I thought CPU wouldn't be increasing in speed, or rather they haven't in terms of Hz for quite some time.
nowdays we having superscalar cpus with the capability to execute multiple opcodes under a clock cycle. 20 years before the cpus were able to do 1.5 opcode per cycle on average. 10 years before we had 2.5 opcode, now we are having 4-5 and the theocretical limit of current architectures are 3 for arm, 6 for amd and 8 on intel as we have as many execution units per core (altrough some of them are only capable to do a specific thing so we will never fully reach this numbers in reality).
What are the wholes you were talking about?

We were talking about _READING_ RAM (memory bandwidth), not some imaginary "real" algorithm, and the simplest algo's probably can get up to 30GB/s (per core).

"you can read RAM with 10-30 gbyte/sec on a thread with cpu. this is not an issue, as with ram size incrases, cpu speeds are also incrasing"
I was referring to this earlier, this sounds like you're saying that today CPU's can read at 10-30GB/s RAM, and it's not an issue because when RAM sizes increase the CPU speed will increase also.

The issue I have with the above is that the 10-30GB/s is not a limit of the CPU, it's the RAM bandwidth. The CPU can easily consume more, which is why cache is these days the largest single "component" on a CPU, because RAM is too slow and not the CPU.
Post Reply