OSDev.org

Posted: **Tue Aug 02, 2005 4:32 am**

Ok, very little preamble/handshaking/init here, straight to my thoughts. I was having a scan through the 'Enable Paging' thread and something Freanan said got me thinking (this may be almost identical to his idea or completely different, I didn't read the thread properly).

What if I have three memory managers on my system? Splitting the functionality up so that on a different architecture the parts that are implemented differently in hardware can be easily swapped out and the source recompiled without having to have a duplicate of the entire manager as you would if it was all one. Basically have a physical manager, a logical manager and a virtual manager.

Physical manager:
Handles tracking of what 4kb chunks of memory are still available and responds to requests by finding an appropriate one and passing the address back.

Logical manager:
Asks for pages from the physical manager, takes the returned address and maps it in the appropriate page table.

Virtual manager:
Responsible for any page swapping to disk that occurs.

This also means the system would be very flexible in development. I can start off with just a physical and logical manager and the virtual manager can slot in later, and if I decided to create a real-mode version of my OS for some insane reason then I could remove the logical manager and the programs can talk directly to the physical manager.

My one problem with this at the moment is the virtual manager. The others operate relativaly normally (program asks for memory from the logical manager, it passes off to the physical manager, passes back to the logical manager, passes back to the program), but how does the virtual manager run? It could be scheduled in its own thread or something but to be honest I don't know under what circumstances the page swapping logic is run normally.

Any thoughts on this? (including "That's what we all do anyway you f00l!"

)

Posted: **Tue Aug 02, 2005 6:18 am**

I like the idea behind the modularity of the three memory mangers. The thing that comes to mind is you would, I think, want to make the interfaces similar so that a user application wouldn't (need to) know the difference between them.

A question for my own edification: why not use memory in byte blocks instead of 4 kB blocks in the Physical Memory Manager? I assume this is for paging and the granularity makes sense, but is that necessarry for the Physical Memory Manager? I ask since I am not using paging and am implementing my (Physical) Memory Manager at the byte level of granularity and using a database to manage the parts, unlike others who use a bitmap. In this vein I'm using a hysterisis gap in order to facilitate the growth and reduction of the database for blocks of memory for the database. At present it isn't tested under an extreme load, but seems to work from a one-sy two-sy increase and decrease.

Posted: **Tue Aug 02, 2005 6:41 am**

smiddy wrote: A question for my own edification: why not use memory in byte blocks instead of 4 kB blocks in the Physical Memory Manager?

Allowing for byte granularity this low in the system could severely degrade performance, and lead to memory fragmentation (in the end you have thousands of little memory blocks but cannot satisfy the reuqest for a continuous 4k block). This is why paging is so state-of-the-art: It avoids the fragmentation issue.

Note that most malloc() implementations use "pools", blocks of memory of a given size, to parcel out upon request; those which I have seen (including the venerable dlmalloc()) use a granularity no smaller than 16 bytes. That's right - you request one byte, and malloc() marks 16 bytes as "used". You need some bookkeeping overhead for malloc() to be performant, and you have to cater for worst-case alignment requirements - thus 8 byte bookkeeping, 8 byte data.

And that is speaking of a very well-performing, user-space library. Now why should an OS service provide a smaller granularity than even the user-space malloc()?

Posted: **Tue Aug 02, 2005 7:17 am**

Unfortunately (or fortunately, depending on your perspective) I haven't read enough about paging. I certainly understand the fragmentation issue, though I can't imagine not having enough RAM for a decent sized block, unless you're running gobs (yeah, not a real technical term) applications that demand memory resources. This I suppose lends itself to the virtualization portion too, which I haven't delved into either. There is the other issue of boundaries too, which you didn't mention, that I suspect can be a problem too. I expect though, fragmentation could be handled with the virtualization aspect, since it seems reasonable to defragment, but virtualize the address space. But then as you point out, there is a performance hit. Has there been some analysis (hard numbers and associated code) that supports this thought process? On the surface this reads right to me, but I'm into the numbers myself in order to make specific determinations on the usefulness (best value) of an approach. I assume defragmenting 4kB blocks is simpler than multidemensional blocks too.

P.S. my apologies for taking this off-topic.

I have a thought about page swapping though: you can base it on the utilization rate of the process. Another historiesis gap for the application(s) in question, if is drops below a certain rate swap it but only after a certain amount of time too, then when activity rate increases versus another process, swap it back. Does this make sense? So utilization first and then time versus the entire number of processes...

Posted: **Tue Aug 02, 2005 7:39 am**

The thing that comes to mind is you would, I think, want to make the interfaces similar so that a user application wouldn't (need to) know the difference between them.

That was something I was thinking of, but really the only time that would come in useful would be porting an app that runs on my OS to a (hypothetical) real-mode version of my OS, and I'm guessing there would probably be enough changes there to make it worthwhile rewriting the app as a whole anyway. As it is though, my thoughts on what calls are needed and etc seem to be gravitating towards having a very similar interface for them.

A question for my own edification: why not use memory in byte blocks instead of 4 kB blocks in the Physical Memory Manager? I assume this is for paging and the granularity makes sense, but is that necessarry for the Physical Memory Manager?

That is an interesting point. I could simply have it so that the LMM (LM2 [LM^2] is easier to say ;D ) always asks for 4kb blocks rather than having this fixed in the PMM, but I'm afraid my tracking techniques aren't up to the level yours appear to be. I'm working on the assumption that I'll be using a slightly optimised style of bitmapping at the moment, which would mean that tracking bytes would take up an enormous amount of space (about 4000 times what tracking 4kb pages would take in fact

). In fact, on that point:

hysterisis gap

You mention that a lot, but you've lost me there, the term sounds familiar but I honestly can't remember what it means.

P.S. my apologies for taking this off-topic

No apology needed, what is off-topic for you is interesting discussion for me

Also, while I'm on the topic of memory management, is there anything in newer systems (no ISA slots, just PCI) that is still restricted to DMA within the first 16Mb? According to the specs, newer devices should be able to use any memory region for it.

Posted: **Tue Aug 02, 2005 8:14 am**

smiddy wrote: I certainly understand the fragmentation issue, though I can't imagine not having enough RAM for a decent sized block, unless you're running gobs (yeah, not a real technical term) applications that demand memory resources.

You have 10 bogobyte of RAM, passed out by "first fit". App A requests 3 bogobyte, app B 2 bogobyte, app C 4 bogobyte. You have 1 bogobyte left (bogobyte #10). App B terminates, freeing 2 bogobyte (#4 and #5). App D requests 3 bogobyte, but you can't satisfy the request because your memory is fragmented, and you don't have 3 bogobyte en block. You also cannot move any of the apps around, since you would break any pointers.

That's the theory. Everything beyond that is just a question of ressources vs. load and uptime.

This I suppose lends itself to the virtualization portion too, which I haven't delved into either.

Virtualization is the logical extension to paging. You pre-fragment memory into pages, and add a layer between the physical addresses and those your apps see. The result is that you can get a "continuous" virtual block of memory by piecing together e.g. 4kb-sized pages of physical memory and adjust the virtual-to-physical mapping accordingly.

The 4k page size, then, is a thing dictated by the MMU, which does the virtualization. (Would be a bit hard to keep a virtual-to-physical mapping for every indiviual byte of RAM.

)

But then as you point out, there is a performance hit. Has there been some analysis (hard numbers and associated code) that supports this thought process?

What do you mean? The performance hit introduced by using virtual addressing? I don't think there have been many benchmarks made on modern CPUs in that regard, because the advantages are pretty obvious and quite convincing. The last OS I know of not using virtual addresses was AmigaOS, and they've been struggling for years to implement it.

On the surface this reads right to me, but I'm into the numbers myself in order to make specific determinations on the usefulness (best value) of an approach.

I usually feel like that myself, but the advantages of paged memory were too convincing even for me, once I got the idea. AFAIK, even most embedded CPU's sport a MMU these days, and I don't know a (non-Amigan) app-programmer who actually could wrap his mind around a non-paged, physical RAM environment...

Besides, you would also fortfeit memory protection... that's pretty much sacrifice for a performance benefit that shouldn't reach the two-digit range even in worst case.

I assume defragmenting 4kB blocks is simpler than multidemensional blocks too.

Huh?

I have a thought about page swapping though: you can base it on the utilization rate of the process. Another historiesis gap for the application(s) in question, if is drops below a certain rate swap it but only after a certain amount of time too, then when activity rate increases versus another process, swap it back. Does this make sense? So utilization first and then time versus the entire number of processes...

One way to do it. There are probably hundreds of others. Least-Recently-Used vs. Most-Recently-Used can become quite religious. Just like the One True Bracing Style.

Posted: **Tue Aug 02, 2005 8:26 am**

I think by multi-dimensional blocks he's talking about differently sized blocks rather than as in multi-dimensional arrays, that's the only way it makes sense to me.

In relation to virtualisation, if I read your description right then that's what I always saw paging as a way of doing anyway, you take blocks from anywhere in physical memory and tell the app they're contiguous.

Edit:
No, actually, that's what I saw virtual memory as being (implemented by the LM2 in my design), I just never thought of a system with proper paging but no virtual memory setup so they kinda merged in my mind.

Guess having the logical manager handling virtual memory and the virtual manager handling page swapping may be a bit confusing to people new to my system, but it makes sense to me, lol.

Edit2:
Name change due to my randomness:
Physical Memory Manager (PM2) stays
Logical Memory Manager -> Virtual Memory Manager (VM2)
Virtual Memory Manager -> Page Swap Manager (PSM)

To make it more coherent and because virtual means artificial to me and hence the virtual memory manager should be the one that's lying to the apps.

Posted: **Tue Aug 02, 2005 8:43 am**

Map of how calls would would work:

1) Application or a library the application uses requests x amount of pages.
2) VM2 asks the PM2 for x pages
3) PM2 searches the availability bitmap for a contiguous block of x pages
4a) If a block is found then a pointer to the beginning of the block and the number of pages (x) is returned
.. OR ..
4b) If there is no appropriately sized block available but there are enough individual pages then a list of pointers along with the number of pages at each place is returned (step 4a simply returns a list of one item)
5) The VM2 maps the pages into the appropriate page table and returns a pointer to the starting logical address of the range to the app

That all seems a pretty logical way of doing it to me.

Posted: **Tue Aug 02, 2005 9:28 am**

Solar wrote: Virtualization is the logical extension to paging. You pre-fragment memory into pages, and add a layer between the physical addresses and those your apps see. The result is that you can get a "continuous" virtual block of memory by piecing together e.g. 4kb-sized pages of physical memory and adjust the virtual-to-physical mapping accordingly.

The 4k page size, then, is a thing dictated by the MMU, which does the virtualization. (Would be a bit hard to keep a virtual-to-physical mapping for every indiviual byte of RAM. )

A bit hard, but do-able. I?ll have to read more about the MMU. I was not going to implement memory protection in my own system, as you point out, but I may have to reconsider after digesting some MMU.

Solar wrote: What do you mean? The performance hit introduced by using virtual addressing? I don't think there have been many benchmarks made on modern CPUs in that regard, because the advantages are pretty obvious and quite convincing. The last OS I know of not using virtual addresses was AmigaOS, and they've been struggling for years to implement it.

I don?t know much about the AmigaOS (feminine friend? OS or the Amiga OS from Amiga computers?) Though my intention is to only use physical memory within my design. I made the assumption (bad smiddy) that real time OS would benefit from not pushing virtualization. My may concern is timing. I will be designing DSP PCI boards eventually for timing critical items. An associated OS has to be ready for use at a moments notice (2 to 3 ms, may need ?s or ns speeds; specific analysis is on-going).

On the surface this reads right to me, but I'm into the numbers myself in order to make specific determinations on the usefulness (best value) of an approach.

Solar wrote: I usually feel like that myself, but the advantages of paged memory were too convincing even for me, once I got the idea. AFAIK, even most embedded CPU's sport a MMU these days, and I don't know a (non-Amigan) app-programmer who actually could wrap his mind around a non-paged, physical RAM environment...

Besides, you would also fortfeit memory protection... that's pretty much sacrifice for a performance benefit that shouldn't reach the two-digit range even in worst case.

Memory protection isn?t a requirement for my system. It won?t necessarily be a desktop environment. More of a technical-geek-fest OS.

You say two digit performance difference may not be reached. It would be interesting to see some numbers. An order of magnitude better is quite an increase on a linear scale.

AFAIK, since CPU development continues into 64-bit I understand that the MMU development hasn?t continued, but more to multi-threading multi-processor(ing). Do you think that lack of development in the MMU area will hurt this or will the sheer speed of processing keep the MMU happy for few more years?

Solar wrote:
I assume defragmenting 4kB blocks is simpler than multidemensional blocks too.
Huh?

Well, after reading the thread further, I made an assumption that the pages (me not having my mind wrapped around paging or the MMU, yet) would need to be defragmented too. As in Kemps writings:

Posted: **Tue Aug 02, 2005 9:29 am**

Kemp wrote: I think by multi-dimensional blocks he's talking about differently sized blocks rather than as in multi-dimensional arrays, that's the only way it makes sense to me.

In relation to virtualisation, if I read your description right then that's what I always saw paging as a way of doing anyway, you take blocks from anywhere in physical memory and tell the app they're contiguous.

Yes, I meant multidimensional blocks as in a 10kB block, a 2 byte block, a 4.678 MB block, not an array specifically; though arrays fit into the flat memory model, they?re abstracted so that others may conceptualize them as multidimensional. (Deep poo here!)

Solar wrote: One way to do it. There are probably hundreds of others. Least-Recently-Used vs. Most-Recently-Used can become quite religious. Just like the One True Bracing Style.

Thanks man, this has been very insightful! I obviously have some reading to do in order to wrap my mind around some of these things.

Kemp wrote: Name change due to my randomness:
Physical Memory Manager (PM2) stays
Logical Memory Manager -> Virtual Memory Manager (VM2)
Virtual Memory Manager -> Page Swap Manager (PSM)

To make it more coherent and because virtual means artificial to me and hence the virtual memory manager should be the one that's lying to the apps.

I like this better as it gives the name a perspective of what it is doing! Acronyms also make it mean more to the designer and user.

Kemp wrote: Map of how calls would would work:

1) Application or a library the application uses requests x amount of pages.
2) VM2 asks the PM2 for x pages
3) PM2 searches the availability bitmap for a contiguous block of x pages
4a) If a block is found then a pointer to the beginning of the block and the number of pages (x) is returned
.. OR ..
4b) If there is no appropriately sized block available but there are enough individual pages then a list of pointers along with the number of pages at each place is returned (step 4a simply returns a list of one item)
5) The VM2 maps the pages into the appropriate page table and returns a pointer to the starting logical address of the range to the app

That all seems a pretty logical way of doing it to me.

Being a systems engineer; how would you test these requirements?

You just laid your ground work but before you start coding you may want to consider how you plan on testing the implementation. This will help you visualize the usage better. At least to me it does. Mileage may vary?

Posted: **Tue Aug 02, 2005 9:57 am**

I can start with the PM2 and have a app (compiled into the OS at that point) that asks it for blocks in various ways (different sizes if I implement that, when all has been used up, when there's enough contiguous, when there's enough scattered, when there's some available but not enough, etc). That's the easy one, the VM2 will probably be a bit harder due to the app needing to check that the page tables are updated correctly etc. I can ask for certain numbers of pages when there are enough available and when there aren't enough available and check the resulting table modifications. The PSM will come later and be even harder to properly test.

The advantage of using a bitmap over a linked list of available space (my other choice) is that I don't need to check if pages each side of a recently freed one can be combined into one larger free section in the list (taking up one entry saying x bytes are free rather than several entries). In a bitmap I just set it to free and it can be checked like any other. Which brings me to the other thing I need to test... freeing the pages. PM2 doesn't really need to be tested other than making sure it sets the correct bits to free, VM2 needs to be tested to ensure the page tables are correctly updated. Overall, much less testing than with allocation.

Posted: **Tue Aug 02, 2005 10:27 am**

I forgot to add what hystoresis was...it is like often most used and thought of in the use of thermistats. Where you set the temperature for a specific point. This is really the mean point as temerpatures rise and fall. Due to the cycle of rising and falling, in order to save components, and provide less work on the system, there is a point above and below the set point. The gap between these points is known as the hystoresis gap. So, if you set your thermometer to 68 degree f, the top point would be around 70 and the bottom would be around 66. At points 70 the heater would turn off or the cooler would turn on until it hit the second value and the opposite would occur based on which system was running. In the case of memory management, I use a water mark so to speak to add or subtract memory for the database, based on the percentage of memory used for that block(s) already allocated. So if a block is 80% full then I add another block, thus if a block had been allocated and the previous block drops below 60% then deallocation can be done on the last block. Does this clear it up?

As for your last post, it all looks sound to me. Hopefully I helped you to get you mind wrapped around the subject better. Good luck with your implementation.

Posted: **Tue Aug 02, 2005 10:47 am**

Ah yes, I remember about that now. It makes it so that if the cutoff was set to 68 and the temperature was almost exactly on 68 then the system doesn't constantly switch from on to off and back again.

Thoughts on the interface to these managers:
VM2 would have a call that asked for a certain number of pages (or a certain amount of space that's a multiple of page size) and returned a pointer to the start of the virtual address they were mapped in at.
PM2 would have a call that asked for a certain number of 4kb blocks and returned the list of pointers and sizes as noted before.
I'm also thinking of having a call in PM2 that simply asks for a block of a certain size (eg, 2Mb) and is returned a pointer to the start of it. This would be essentially identical to the VM2 call except working with real memory. Obviously a contiguous block of that size would have to exist. This could be useful for the OS to call and would be very useful for a real-mode or non-virtual memory version of the OS (ie, with VM2 stripped out).

I'm still debating whether to have a flag in the call to say you want old style DMA compatible memory and reserving the first 16Mb for that. I'm not supporting hardware old enough to have ISA cards and the like so I really shouldn't need to, but then we have the floppy drive. That still requires the region for DMA transfers to be in the low end doesn't it?

My overall plan is gradually dropping into place one piece at a time.

Posted: **Wed Aug 03, 2005 12:04 am**

smiddy wrote:
Solar wrote: The 4k page size, then, is a thing dictated by the MMU, which does the virtualization. (Would be a bit hard to keep a virtual-to-physical mapping for every indiviual byte of RAM. )
A bit hard, but do-able.

Ahem... means that for every byte of RAM you need four bytes telling you which physical address it maps to... 3:1 overhead...

I don?t know much about the AmigaOS (feminine friend? OS or the Amiga OS from Amiga computers?) Though my intention is to only use physical memory within my design.

Amigas were a line of personal computers based on the 680x0 CPU. When first conceived in the mid-80ies, MMU's were only available as (expensive) stand-alone chips (68851 IIRC), and thus not considered for inclusion in the system. The result was the only "mainstream" desktop OS I know of that supported neither virtual address spaces nor virtual (swap) memory.

The result was a microkernel (Exec) that was very quick and efficient, because in an unprotected environment every message is just a 32bit pointer.

The lack of memory protection was a real setback, however.

I made the assumption (bad smiddy) that real time OS would benefit from not pushing virtualization.

Embedded real-time is a different beast, I agree. Here, you can easily get away with not having memory protection or virtual adressing, since all components of the system are usually known beforehand.

You say two digit performance difference may not be reached. It would be interesting to see some numbers.

As I said, it's probably hard to get them for lack of comparable OS'es without MMU employed. You're the one interested in the numbers, you do the Google.

AFAIK, since CPU development continues into 64-bit I understand that the MMU development hasn?t continued...

I don't really see how those two relate to each other.

The job of an MMU is pretty clear-cut, and doesn't have the large optimization potential like integer pipelining or SIMD units. The driver code controlling them has seen quite some improvements over time, in most OS'.

...but more to multi-threading multi-processor(ing). Do you think that lack of development in the MMU area will hurt this or will the sheer speed of processing keep the MMU happy for few more years?

I don't see MMU's as the bottlenecks anywhere. They provide an essential service for desktop OS', the algorithms for using them efficiently are well-researched and understood. Bottlenecks are elsewhere, so the "leaps" are made where it counts (PCI-e, frontside busses, ...).

Well, after reading the thread further, I made an assumption that the pages (me not having my mind wrapped around paging or the MMU, yet) would need to be defragmented too.

Of course you have intra-page defragmentation, too; that's what your malloc() implementation has to worry about, among other things. (Try the dlmalloc() homepage. It has quite interesting read on the subject of malloc() algorithms.)

OSDev.org

Memory Management Idea

Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea

Re:Memory Management Idea