OSDev.org

Posted: **Mon Apr 04, 2005 12:34 pm**

I have already implement a simple memory managent for both user and kernel space. I want now to implement a complex one.
I have readed a lot of docs and always talk about external/internal fragmentations and other problems, but I don't have understand them well.
for example what the problem if a program got non-contineuos physical memory!!
which part of the kernel/user should/need contineuos pysical memory?

Posted: **Mon Apr 04, 2005 12:47 pm**

Any memory allocated by malloc kmalloc and new and family should be at least virtually continuous through paging. Lets say a process allocates 1mb of memory through one single call to malloc...you simply can't cut that in 2 or more unless you use paging.

Posted: **Mon Apr 04, 2005 1:50 pm**

all the problem which are discussed for memory management is about non-contineuos/contineuos physical memory not virtual memory by using paging.
surely when a user/kernel request 1mb it get a contineuos virtual address but it doesn't mean contineuos physical address!!! and the later is most what memory management implementation (algorithm) try to make it better.
the question, why they want to make it as possible as they can!

Posted: **Mon Apr 04, 2005 1:55 pm**

In that case, the only things that need to be continuous is the GDT table IDT table paging table...basically all the CPU internals need to be continuous

Posted: **Mon Apr 04, 2005 4:44 pm**

the primary need for being able to allocate physically contiguous pages is for DMA. This is because DMA doesn't go through the standard virtual memory things and thus requires the pages to be physically contiguous.

proxy

Posted: **Mon Apr 04, 2005 8:01 pm**

Hi,

amirsadig wrote: all the problem which are discussed for memory management is about non-contineuos/contineuos physical memory not virtual memory by using paging.
surely when a user/kernel request 1mb it get a contineuos virtual address but it doesn't mean contineuos physical address!!! and the later is most what memory management implementation (algorithm) try to make it better.
the question, why they want to make it as possible as they can!

For linear/virtual memory, the fragmentation caused by malloc/free can effect the size of the largest allocatable area of memory. In general the amount of usable linear/virtual address space is so large that this rarely matters.

For physical memory, fragmentation is only a problem for DMA (as already mentioned). To minimize this I use a bitmap to manage memory below 16 MB (low memory), and free page stacks for everything else. When a normal page is allocated the free page stacks are prefered, so that as much "low memory" is free as possible. When a normal page is allocated from low memory, the lowest suitable page is allocated in an attempt to maximize the largest free physical low memory area.

The other issue is the CPU caches. Most CPU caches are implemented such that where physical memory is stored within the cache depends on several of the bits in the physical address. This can limit the effectiveness of the cache. For example, a 1 Mb cache may only be able to hold 4 pages of data that have physical addresses where the bits 0 to 17 in the physical address are the same (even though the same cache would hold a total of 256 pages).

To improve the effectiveness of these caches some OS's use "page colouring", in an attempt to ensure that logically contiguous pages seem to be contiguous as far as the cache is concerned. This doesn't mean that the pages are physically contiguous, only that the relevant bits within physical addresses are contiguous.

Linux doesn't use page colouring. AFAIK FreeBSD does, but expects the number of "page colours" to be set manually (default is 4 page colours). I don't know what Microsoft's OSs do.

My OS uses CPUID to determine the number of page colours automatically (where possible) or uses a default depending on CPU family for older CPUs. In any case my code over-estimates the number of page colours as there's little disadvantage of this (and I can't get information on external caches using CPUID).

Cheers,

Brendan

Posted: **Tue Apr 05, 2005 12:10 am**

my current physical memory is a bitmap allocator. it just return a page per request and release a page. kernel use kmalloc to satisty kernel request of free memory. depend of size requested it ask physical allocator for a page or (inside a loop) for number of page. kmalloc then map that phyicall address to a virtual address inside kernel leaner address and this repeated untill request is satisfy.
I will try to implement a slab allocator for deviding the memory management to number of layer. this make memory management easer than put it in one functions for all purpose!

Posted: **Tue Apr 05, 2005 1:41 am**

DruG5t0r3 wrote: In that case, the only things that need to be continuous is the GDT table IDT table paging table...basically all the CPU internals need to be continuous

not to my knowledge. ppl usually prefer them continuous so that they're independent of whether paging is enabled or not, but that's not a requirement.

Posted: **Tue Apr 05, 2005 7:12 am**

According to my Intel book here, if a TSS spans on more than 1 page, it needs to be physically continuous memory.

Posted: **Tue Apr 05, 2005 8:52 am**

DruG5t0r3 wrote: According to my Intel book here, if a TSS spans on more than 1 page, it needs to be physically continuous memory.

true for TSS, but that's roughly the only system structure that needs it.

For devices that need contiguous memory for DMA transfers, the best things to do is usually to bring data into a contiguous buffer allocated from a pool that is separated from 'main' memory pool.

Posted: **Wed Apr 06, 2005 2:45 pm**

Looks like I was wrong about the GDT and IDT and PDT.

GDT calculations are based on physical references and is also done before any paging and (of course) any segmentation is done.

A GDT permission can override a page permission, but not the contrary.

Same things apply for IDT.

And as for PDT...as long as they are physically 4096 byte alligned, they can be non-continuous.

Posted: **Wed Apr 06, 2005 4:51 pm**

There is a nice picture about that in "IA-32 - Developer's Manual Vol3 - System Programming Guide" in Chapter "3.1 Memory Managment Overview"

Posted: **Wed Apr 06, 2005 6:20 pm**

The only possible advantage of physical contiguity I have speculated is that it *may* speed up the cpu's caching of data structures/arrays that cross page boundaries. I am not terribly familiar with the details of accessing RAM, but I am pretty sure the initial access to an address is slower than reading the following segments of memory (where the segment size depends on the bus width). This is similar to hard disks, where you may spend 10ms to access an address, but you can transfer very quickly.
In a typical VMM, data that crossed a page boundary would force the RAM access to locate the new physical address, creating a super tiny delay. Theoretically, you could remove those delays.
However, the odds of the data crossing the boundary aren't very high (it depends on the size of the data being cached, but 32bytes/4096bytes = 1/128 = <1% chance. On top of that, the RAM mechanism may always perform an address lookup for each data block the cpu cache requests. I'm not sure how the hardware actually functions at that level.
Most likely the overhead of the implementation would outweigh the benefits, if any.

Posted: **Wed Apr 06, 2005 9:40 pm**

Hi,

Some notes on page colouring..

From http://www.kernelnewbies.org/glossary/ :

"page colouring
A system to ensure that virtually-contiguous pages are not mapped to the same cache lines, improving performance. FreeBSD does this, but Linux, after some discussion, doesn't (yet)."

I haven't been able to find decent performance comparisons, but most sources estimate that page colouring can result in up to 10% performance improvement. The main reason for this is that the amount of improvement depends on cache usage (by software) and cache design (in hardware). For e.g. a "fully associated" cache isn't going to get any performance improvement, an "8-way set associative" cache will get some performance improvement and a "2-way set associative" cache will get more performance improvement.

For Intel CPUs there's L1, L2 and (maybe) L3 caches. For example, my Pentium 4 has an internal 4-way L1 data cache and internal 8-way L2 cache. The Pentium Pro used an internal 2-way set associative L1 data cache and an internal 4-way set associative L2 cache. In all cases it's difficult to predict the type of the external cache (if any).

In any case there's no real performance disadvantages for supporting page colouring.

It isn't too easy to "retro-fit" to an existing OS though, as the physical memory manager needs to know what the linear address of a page will be when allocating a physical page. For new OSs (where old code isn't a concern) it's fairly easy to support (depending on how physical memory is managed).

Cheers,

Brendan

Posted: **Thu Apr 07, 2005 3:19 am**

in the case of "demand paging", one can rather easily add page colouring by creating N "page pools" (one per colour) and change the virtual memory management so that page for virtual colour i is requested from pool of colour i.

Things become harder to do if you want to ask several physical pages in a row.

GDT calculations are based on physical references and is also done before any paging and (of course) any segmentation is done.

No. Look at section 2.4.1 and 2.4.3 in the Intel manuals: both GDTR and IDTR contains linear address, not physical address. That clearly means (according to figure 3-1) that this address still has to go through paging system before resolved.

(don't take it hardly: i did the same mistake one year ago)

A GDT permission can override a page permission, but not the contrary.

What do you mean with "a GDT permission" ? and how should i interprete "override" ? ... If a segment descriptor tells i can write to a segment but the page isn't writable, i cannot write. And similarily, if a segment tells i cannot write, i cannot write regardless of what the page can do. So you actually AND permissions together to know what you can do.

Same things apply for IDT.

Indeed, IDT works as GDT (that is, affected by paging)

And as for PDT...as long as they are physically 4096 byte alligned, they can be non-continuous.

no reference to "PDT" found in the intel manual. What are you talking about, exactly ? page tables ?

OSDev.org

Memory Management: non-contineuos/contineuos

Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos

Re:Memory Management: non-contineuos/contineuos