OSDev.org

Posted: **Mon Aug 28, 2006 8:25 pm**

While I was thinking about the implementation of virtual memory, I released that there is no easy way to make it work with a microkernel design without breaking the base design caracteristics of a microkernel...

In a microkernel design, device drivers are, from the point of view of the kernel core, like ordinary programs, right? And, in order to load and save memory pages, one needs to access some hardware device (ideally chosen by the user to be used as virtual memory)... So, be this device whatever storage device available in the system for which someone has written a device driver, the virtual memory manager only has to communicate with the device driver through some standard convention, given that the driver is loaded... no problem!
If you had a monolithic design, you just had to make sure that the kernel is a special process... and special because it's pages are *always* in memory...

Now, and taking into account microkernel designs, what if the virtual memory manager needs more free phisical memory and writes out the IDE driver (or the driver for whatever device the virtual memory is stored on) to virtual memory?
As microkernels usually use some form of message passing to communicate between kernel components, the pagefault exception handler needs to wait until the disk driver process is resumed, executes the write to the disk and responds with some kind of DONE message... when the DONE message is received by the main process, the pagefault exception handler would mark the pages as not present, flush page caches, and then it would send another message to the disk driver asking it to read in the data it needed the space for, and would wait for... nothing! The driver would never respond because, at the least bad possibility, when the driver is resumed by the kernel core, the processor would find that the page to which CS:EIP points was not present and pagefault... that pagefault would generate another unanswerable message to the disk driver (i.e. it's own process!) and wait for itself to respond...
The result depends on many factors, but the system would be no longer usable... even if the kernel doesn't panic, one would be unable to load new programs and, sooner or later, as more and more processes start requesting services from the disk driver, more and more processes would not be able to respond to the user...

The obvious solution was making sure that the disk driver one choses to access the virtual memory will remain in memory for ever, but the microkernel design doesn't know the concept of device driver becuase, for microkernels, everything is an ordinary program, and ordinary programs are pieces of code which execute in a 4GB virtual address space that can be transparently paged in and out...

How do microkernels usually solve this problem?

JJ

Posted: **Mon Aug 28, 2006 10:59 pm**

I'm facing this same problem in my microkernel design. I think in this case it's practical to make all processes involved in virtual memory "special" as far as the kernel is concerned. One rule for these special processes would be that they cannot allocate virtual memory without immediately committing it to physical memory and locking it. I haven't completely figured out all the corner cases yet.

As for how other microkernels do it, the only one I know enough about is L4. It gets around the problem by delegating virtual memory management policy to a user process. The kernel itself only implements the mechanisms via map, grant, and unmap operations that allow for the construction of recursive address spaces. It's up to the user-space VM servers to avoid paging out critical code and data. It's hard to explain how L4 memory management works -- google for it. I'm sure there are papers out there that can explain it better than I can.

There are other microkernels that just avoid the problem entirely -- Minix and Singularity spring to mind. Neither supports virtual memory. AFAIK Minix implements process isolation using segmentation, while Singularity implements "software isolated processes" by insisting that all apps be compiled to verifiably type-safe intermediate code.

Posted: **Tue Aug 29, 2006 11:15 am**

Colonel Kernel wrote: I'm facing this same problem in my microkernel design. I think in this case it's practical to make all processes involved in virtual memory "special" as far as the kernel is concerned. One rule for these special processes would be that they cannot allocate virtual memory without immediately committing it to physical memory and locking it. I haven't completely figured out all the corner cases yet.

Thanks... parhaps I will follow your example and create the concepts of "special process" and "locked page"...

Colonel Kernel wrote: As for how other microkernels do it, the only one I know enough about is L4. It gets around the problem by delegating virtual memory management policy to a user process. The kernel itself only implements the mechanisms via map, grant, and unmap operations that allow for the construction of recursive address spaces. It's up to the user-space VM servers to avoid paging out critical code and data. It's hard to explain how L4 memory management works -- google for it. I'm sure there are papers out there that can explain it better than I can.

It seems quite complex, but parhaps in the future I will invest some time in exploring that...

Colonel Kernel wrote: There are other microkernels that just avoid the problem entirely -- Minix and Singularity spring to mind. Neither supports virtual memory.

Yes, yes, yes... but it doesn't fit to me cause I want to make as many programs as possible persistent, so that I can simply flush the RAM to the disk and then halt (so one can turn off the computer)... when I boot the system once more, it would resume all the persistent programs which would continue runing as nothing had happened!
Somewhat like suspend to disk... But I also want to keep files in virtual memory (parhaps in the memory space of a file server), and to make file opening simply memory sharing operations...

Most of this ideas about orthogonal persistency were taken from what I read about Unununium...

Colonel Kernel wrote: AFAIK Minix implements process isolation using segmentation,

It was basically nonsense to me when I read in "Linux is obsolete" Andy saying that Minix was originally written to run on 8088 machines with no harddisk (i.e. IBM PC)...
It sounds really strange, as Minix is suposed to be a UNIX clone && UNIX has memory protection && Intel 8088 doesn't implement memory protection... very strange, really... Someone inclusively suggested Andy to update MINIX in order to take advantage of I386 protected mode...

JJ

Posted: **Tue Aug 29, 2006 11:43 am**

I think I've read somewere Minix 3 does have paging, but I'm not really sure...

Posted: **Wed Aug 30, 2006 4:01 am**

For microkernel memory management, i thought up the following idea. No idea wether its used or not.

Dependent on design, you can create an userspace app that eats all or most of memory before proclaiming itself the new memory manager. That way you can have user space virtual memory management.
Because you steal the kernel's resources only once you're running, you can only effectively share the memory you asked from the kernel, which means that the vmm and disk drivers are automatically locked. You can also eat say 75% and then allow the user app to choose between virtual and locked memory by asking either the driver or kernel respectively, or have the remaining memory be used by the kernel to support its task structures
To support this transparently, the necessary trick your kernel must support is that kernel calls can be overridden by userspace sections, which basically is a design decision. Otherwise, you'll have to make all post-vmm apps virtual memory aware.

Posted: **Wed Aug 30, 2006 8:35 am**

Hmm... The part about a user-space process taking most of physical memory reminds me of "sigma 0" in L4. The rest just gets a bit confusing...

The main reason I haven't gone with something like the L4 scheme is that I don't require that level of flexibility -- for example, I don't want to support shared memory at all.

Posted: **Wed Aug 30, 2006 6:57 pm**

Hi,

Jo?o Jer?nimo wrote:In a microkernel design, device drivers are, from the point of view of the kernel core, like ordinary programs, right?

They are like ordinary programs, but not always identical...

For my design there are differences between "system processes" and normal processes. These differences include:

- system processes may have an entirely "locked" address space, where all virtual memory always uses physical RAM (determined by a flag in the executable's header).
- users and normal processes can't start system processes directly. Only the kernel and other system processes can start more system processes.
- system processes are not allowed to use networking unless higher level networking code started the process. This is mainly to prevent security problems - for e.g. a device driver that bypasses security by accessing raw data and then sending this data to someone else on the internet (keypresses, disk data, etc).
- when the OS is shutdown, normal processes are told to shutdown first and system processes are told to shutdown last. This just means an application can save modified data before the file system and disk drivers disappear.
- the scheduler will let system code use "highest priority" threads, while normal processes can't.

Everything that isn't listed above is identical for normal processes and system processes.

Cheers,

Brendan

Posted: **Thu Aug 31, 2006 12:09 pm**

Brendan wrote:
Jo?o Jer?nimo wrote:In a microkernel design, device drivers are, from the point of view of the kernel core, like ordinary programs, right?
They are like ordinary programs, but not always identical...

For my design there are differences between "system processes" and normal processes. These differences include:
- system processes may have an entirely "locked" address space, where all virtual memory always uses physical RAM (determined by a flag in the executable's header).
- users and normal processes can't start system processes directly. Only the kernel and other system processes can start more system processes.
- system processes are not allowed to use networking unless higher level networking code started the process. This is mainly to prevent security problems - for e.g. a device driver that bypasses security by accessing raw data and then sending this data to someone else on the internet (keypresses, disk data, etc).

Uhm... just a small question, but when you have a device driver running in kernel mode - what point is there in arbitrarily limiting them? How do you start a first system process? Also, when a system process fails, how do you restart it?

The rest seems pretty logical and obvious.

Posted: **Thu Aug 31, 2006 12:28 pm**

Candy wrote:...but when you have a device driver running in kernel mode...

Does brendan do that? I'm certainly not going to do/doing this in my microkernel.

How do you start a first system process?

I use grub as a loader. From these modules (elf executables) the kernel creates "system processes" (or servers).

Also, when a system process fails, how do you restart it?

Perhaps some sort of "system process manager"/"server manager", which can ping the servers and try to restart them on failure?

Posted: **Thu Aug 31, 2006 1:01 pm**

bluecode wrote:
Candy wrote:...but when you have a device driver running in kernel mode...
Does brendan do that? I'm certainly not going to do/doing this in my microkernel.

You either have a kernel-level process or a process with kernel-level killing object access. Both are lethal. Pretty moot difference except for memory faults, which should kill most of your test systems anyway. I'm going to keep everything in the kernel except for having a userland-wrapper so some modules can run in the userland space if they have actually got no point being in the kernel. Call it a hybrid.

How do you start a first system process?
I use grub as a loader. From these modules (elf executables) the kernel creates "system processes" (or servers).

So, you're going to shield all users from the system (user processes can't create system processes). I can imagine that in a humane world where humans protect themselves from their foes that you would have such protection, but in a computer world the computer should still be MY computer, even when I want to kill it or make it spam the world. Don't arbitrarily limit. Period.

Also, when a system process fails, how do you restart it?
Perhaps some sort of "system process manager"/"server manager", which can ping the servers and try to restart them on failure?

Is an idea. Where do you get the data for the restart? Can the device restart? Will attempting this crash the system even more? Will you create an extra pointless load?

Posted: **Thu Aug 31, 2006 3:01 pm**

Candy, if you're assuming a hybrid architecture, then your comments are not really relevant to the original poster's question. We're talking about microkernel systems with user-space drivers in this thread. Whether this is a good idea is a separate issue that deserves its own thread IMO.

Posted: **Thu Aug 31, 2006 8:54 pm**

Hi,

Candy wrote:You either have a kernel-level process or a process with kernel-level killing object access. Both are lethal. Pretty moot difference except for memory faults, which should kill most of your test systems anyway. I'm going to keep everything in the kernel except for having a userland-wrapper so some modules can run in the userland space if they have actually got no point being in the kernel. Call it a hybrid.

For me, everything runs as user-level processes, except for the kernel which is never killed.

Candy wrote:So, you're going to shield all users from the system (user processes can't create system processes). I can imagine that in a humane world where humans protect themselves from their foes that you would have such protection, but in a computer world the computer should still be MY computer, even when I want to kill it or make it spam the world. Don't arbitrarily limit. Period.

For me, the boot code starts the kernel, then the virtual file system and device manager. All other system code is "pulled" onto the computer as a (direct or indirect) result of the device manager's hardware auto-detection. For example, the device manager might scan the PCI bus, find a USB controller and start the USB controller's device driver. The USB controller's device driver would search for connected USB devices, and might end up starting device drivers for the keyboard, mouse and some flash memory. The flash memory device driver would check the contents of the device and start some file system code.

What you end up with is a tree of processes, where each process is a child process and has a parent. The parent process "looks after" it's children - giving them permission to access what they need (I/O ports, memory ranges, etc), restarting them if they crash, telling them to die if requested, managing "device driver changeover", etc. The only exceptions to this are the device manager and virtual file system, which have no parent (the boot code that started them terminates itself after boot)- these processes can't be killed, and if they crash the entire system will need to be rebooted.

This doesn't mean that user processes (and users) can't start system processes in all cases. For example, if a user process asks the virtual file system to mount the file "floppy.img" using the SFS file system code, then the virtual file system code (a system process) would start the SFS file system code (another system processes), after doing some security checks and possibly imposing restrictions on which operations the VFS code passes on to the newly mounted file system.

Candy wrote:
Perhaps some sort of "system process manager"/"server manager", which can ping the servers and try to restart them on failure?
Is an idea. Where do you get the data for the restart? Can the device restart? Will attempting this crash the system even more? Will you create an extra pointless load?

I never really did like polling. For me, any process can elect to receive "obituaries", which are messages sent when something is terminated. These are used for cleanup. For example, if a file system driver crashes then it's obituary would be sent out to whoever has them enabled. The file system process's parent would receive this obituary and restart the process, while the virtual file system would also receive the obituary and send out error messages to anyone who has open file handles, etc. The parent process has all of the data needed to restart the crashed process, because the parent process was the one that started it running to begin with.

This also needs to happen in reverse - if a process receives an obituary saying its parent died then the process terminates itself. In this case the parent's parent would restart the parent, and the new parent would then restart its children. For example:

[tt]A Process A started
A -> B Process B started
A -> B -> C Process C started
A C Process B crashed
A Process C terminates itself because its parent died
A -> B Process B restarted by process A
A -> B -> C Process C restarted by process B[/tt]

As is the case for all micro-kernels, the ability to recover from system code failure depends on the amount of extra code built in for "fault tolerance". I would expect that for "version 1.0" of my OS any problem with system code will probably make the entire computer completely unusable (reboot required). Hopefully, for "version 121.0" it will be able to seemlessly recover from almost every possible problem (software and hardware failures).

Cheers,

Brendan

Posted: **Sat Sep 02, 2006 9:43 am**

Colonel Kernel wrote: Candy, if you're assuming a hybrid architecture, then your comments are not really relevant to the original poster's question. We're talking about microkernel systems with user-space drivers in this thread. Whether this is a good idea is a separate issue that deserves its own thread IMO.

My comments aren't based on a hybrid architecture, the note that I'm developing one is more about that I prefer that architecture and that I would probably defend it against other ideas. It is a reason for me to think the way I think in some cases.

My problem with a microkernel is that there are a lot of processes that can not be restarted cleanly if the kernel had to because of an error. I'm not disagreeing on putting printer drivers etc. in user space (hence not monolithic), but I'm also disagreeing on putting everything there (hence not microkernel). The disagreement with the second (which is the only relevant one here) is that I don't see how you can make some hardware devices restart or free up resources, especially in the context of security with some, in a way that works. You'd need microkernel-level checking of all hardware resources including freeing them upon error etc. Also, you'd need a lot of hardware protection, you'd need to check pretty much every outw or inw. For a device that has a lot of memory, that's one hell of a lot of access. Agreed, memory-mapped devices and DMA clean up the interface a lot, but it still limits the microkernel design.

Small thing, I didn't respond to this thread to boast or to start a flamewar. I posted since I disagree with the generic mood here, I don't understand why we need to think differently and I'd like to be convinced of your opinion or I'd like to convince you of mine. I'm not out to seek a war.

Posted: **Sat Sep 02, 2006 1:24 pm**

Brendan wrote: This also needs to happen in reverse - if a process receives an obituary saying its parent died then the process terminates itself. In this case the parent's parent would restart the parent, and the new parent would then restart its children. For example:

[tt]A Process A started
A -> B Process B started
A -> B -> C Process C started
A C Process B crashed
A Process C terminates itself because its parent died
A -> B Process B restarted by process A
A -> B -> C Process C restarted by process B[/tt]

What would happen if process C didn't terminate after B died (whether it be malice or just a buggy program)? Would the system automatically kill it after a certain amount of time?

Posted: **Sat Sep 02, 2006 11:20 pm**

Hi,

fascist-fox wrote:What would happen if process C didn't terminate after B died (whether it be malice or just a buggy program)? Would the system automatically kill it after a certain amount of time?

It would need to be killed by the kernel if it doesn't terminate itself. In addition, it would need to be automatically removed from active service before it's replacement is started.

For example, if the parent process is a device driver for a USB controller and the child is a flash memory driver, then if the parent process crashes the flash memory driver can no longer access it's device. In this case the virtual file system needs to abort any outstanding requests and remove the flash memory driver from it's list of mounted devices, so that no further requests are sent to the flash memory device driver.

I'd make sure each child process is completely terminated before restarting the parent process. This isn't strictly necessary (only removing it from active service is required), but it is conceptually cleaner and better for resource usage (i.e. freeing memory before allocating more for the replacement).

Also note that the same thing applies to the entire tree - from the crashed process to every "end-point". For example, imagine part of the system code tree looks like this:

[tt]Device Manager
|
|_SCSI Controller
| |__Hard Disk
| | |__File System (partition 1)
| | |__File System (partition 2)
| | |__Swap Space (partition 3)
| |__CD-ROM
: |__File System[/tt]

In this case, if the SCSI driver crashes you'd need to terminate 6 additional processes. When the SCSI driver is started (or restarted) it'd detect the hard disk and the CD-ROM and start processes for those (and when the hard disk and CD_ROM device drivers are started they'd detect the file systems, etc needed). This means that system code doesn't need to know if it's being restarted or if the system is booting - they do the same thing during startup regardless.

Of course this isn't restricted to crashes - the SCSI controller might be in a "hot-plug" PCI slot, or a newer/better SCSI device driver may have been downloaded. It should handle software crashes, hardware failures, hardware upgrades and software upgrades (and should make writing device drivers much easier - no rebooting to test it).

Of course the real problem is trying to keep normal processes running when part of the system code tree disappears, but that's probably better discussed in a less off-topic thread...

Cheers,

Brendan

OSDev.org

Possible virtual memory problem in microkernels

Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels

Re:Possible virtual memory problem in microkernels