OSDev.org

Posted: **Wed Apr 28, 2010 11:57 am**

I have a few questions on design of a microkernel (although some of them aren't necessarily microkernel-related) as I plan to write one at some point (not for a while, but I want to make plans).

1. What things absolutely must be done in the kernel?
2. Is it possible to have a server to handle scheduling and is it worth it?
3. What syscalls are needed? I've come up with a few[1]
4. Is this a practical use of hardware rings? If not, can you suggest a better layout?
5. Could you list some pros/cons of AMP vs. SMP? The Wiki doesn't really say anything about AMP and doesn't list pros and cons of SMP. I've also googled this and checked the forums.

Thank you.

[1]

Code: Select all

+--------+----------------------+------+-------------------------------------+
| Device | Syscall              | Ring | Description                         |
+--------+----------------------+------+-------------------------------------+
| Memory | peek[b/w/d]()        | 2    | Read a byte/word/dword from memory  |
| Memory | poke[b/w/d]()        | 2    | Write a byte/word/dword to memory   |
| Memory | idt_set_vector()     | 2    | Create an interrupt vector          |
| Video  | change_vmode()       | 2    | Change the video mode               |
| Video  | draw_pixel()         | 2    | Draw a pixel (if vmode appropriate) |
| Video  | draw_char()          | 2    | Draw a character (if appropriate)   |
| Video  | get_screen_attribs() | 3    | Get screen attributes (w, h, bpp)   |
| Video  | set_screen_attribs() | 3    | Set screen attributes (w, h, bpp)   |
| Disk   | read_disk_bytes()    | 2    | Read data from disk                 |
| Disk   | write_disk_bytes()   | 2    | Write data to disk                  |
| Disk   | get_disk_geometry()  | 3    | Get the disk geometry               |
| Disk   | set_disk_geometry()  | 2    | Set the disk geometry               |
+--------+----------------------+------+-------------------------------------+

[2]
Ring 0 : Kernel
Ring 1 : Drivers
Ring 2 : Servers
Ring 3 : Userspace programs

Posted: **Wed Apr 28, 2010 12:27 pm**

Asymmetric is just not done in the traditional sense, because you can't normally transfer tasks between processors, you'd have to relocate the image every time you did, if not change/recompile the binary for a new architecture. Basically, AMP is functionally much like having two distinct computers (even though the communications between them may be much faster than normal networks would). Things get especially nasty when one of the processors does not support the features of the other processor. Like a GPU does not have protection and you can not just run something on it and be safe with it. That aside, AMP can be much faster when all cores are used for their designed purposes.

SMP (and NUMA) have the advantage of being much simpler. The penalties of moving a task to another core are much lower, and you don't have address or architecture differences.

As for the rings, they are unsupported in 64-bit mode, and you don't have the isolation between processes and drivers individually which forms the basis of a microkernel. Also paging works only between "user" and "kernel" space, so the kernel space can't be protected from servers and drivers via that mechanism.

Also, the syscall instructions expect that you use ring 0 and 3 exclusively. And so do non-x86 architectures.

Posted: **Wed Apr 28, 2010 12:45 pm**

So 64-bit and microkernels are mutually exclusive? That sucks! I wanted 64-bit support

I guess I have to make a hybrid then?

Posted: **Wed Apr 28, 2010 1:13 pm**

What I referred to is that segmentation is disabled in 64-bit mode, so you're left with paging only. Which doesn't distinguish between more than two protection levels - kernel and userspace.

Posted: **Wed Apr 28, 2010 6:12 pm**

Combuster wrote:As for the rings, they are unsupported in 64-bit mode, and you don't have the isolation between processes and drivers individually which forms the basis of a microkernel. Also paging works only between "user" and "kernel" space, so the kernel space can't be protected from servers and drivers via that mechanism.

Whoa whoa whoa. Since when has any sane (i.e. modern, i.e. fast, i.e. not Mach) micro kernel done anything other than treat drivers as processes (Often with special memory maps)?

Posted: **Wed Apr 28, 2010 7:33 pm**

Synon wrote:So 64-bit and microkernels are mutually exclusive? That sucks! I wanted 64-bit support

I guess I have to make a hybrid then?

Do you really understand the term "Mutual exclusive", "microkernel", "hybrid kernel"? And who told you that microkernels cannot run in 64-bit mode? I think you need more reading...
Microkernel
Hybrid Kernel
X86-64
Semaphore
http://en.wikipedia.org/wiki/Mutual_exclusion

Posted: **Wed Apr 28, 2010 7:37 pm**

Combuster, you're right. There are only 2 privelege levels for data (U and S). But IOPL continues to be supported, and ring0 allows certain privileged instructions. So the rings may/do exist and differ slightly from one another.

For my 32bit microkernel I was using

Ring0: kernel/isr
Ring1: process/memory management process (still has access to entire kernel memory, page tables etc.)
Ring2: NOT USED
Ring3: Non privileged, normal user apps AND drivers (running as userspace apps with IO privilege)

I had planned to continue with this setup now that I'm in 64bit mode.

BTW, in 32bit mode, I've found that while SYSENTER seems to works from Ring1 to Ring0, SYSEXIT does not work and I find myself (after making sure the interrupt stack is correct) having to do an IRET to get back to my ring1 threads. I'm still working out whether this scheme is worth it for 64bit.

- gerryg400

Posted: **Wed Apr 28, 2010 9:06 pm**

1. What things absolutely must be done in the kernel?

Let's say that by kernel you mean ring0 or supervisor mode. The following must be in the kernel.
- Setting up and maintaining system structures (gdt, idt, tss etc.)
- Context switching (processor and memory context)
- Basic interrupt handling - although there are ways to move the processing involved in ISR to users space.
- Trap handling - although again the service routines may be in user space.

The following should/could/may/often/usually in the kernel mainly for efficiency
- Scheduling
- System resource handling
- Some level of process management
- Some level of memory management

The following are often NOT in a microkernel. It depends on the philosophy of the designer
- Drivers (including screen and disk)

2. Is it possible to have a server to handle scheduling and is it worth it?

Yes. To decide whether it's worth it you compare the benfit to the cost. I'm not sure what the benefit would be (perhaps user installed scheduling?). There would seem to be some cost if you had to switch context to the scheduler just to find out whose context to switch to.

3. What syscalls are needed? I've come up with a few[1]

I assume by system call you mean 'function supplied by kernel'. The reason I ask is that many traditional system calls are not implemented as system calls in a microkernel but rather as a message to a system server. Most of the calls you list (apart from idt_set_vector) would usually not be system calls in a microkernel.

- gerryg400

Posted: **Wed Apr 28, 2010 11:03 pm**

gerryg400 wrote:
3. What syscalls are needed? I've come up with a few[1]
I assume by system call you mean 'function supplied by kernel'. The reason I ask is that many traditional system calls are not implemented as system calls in a microkernel but rather as a message to a system server. Most of the calls you list (apart from idt_set_vector) would usually not be system calls in a microkernel.

- gerryg400

It's actually safe to say that none of these, he/she mentioned, should ever be present. If a microkernel were to use system calls, then the ones supplied would most likely be those for memory management (allocation, freeing, getting memory information, etc.), I/O management (acquiring port ranges, releasing port ranges, etc.), BIOS access (interrupt calls, etc.) and IPC (setting up a port, setting up shared memory, etc.), where normal applications (e.g. a text editor) wouldn't even use the ones for I/O management and BIOS access, since they are meant to be used by drivers (e.g. a graphics driver). Normal applications, however, do use the other two I mentioned. Memory management solely for the purpose of acquiring memory and using it (just as in an usual monolithic kernel). IPC is used to communicate with drivers, or more likely the servers in between, which then again communicate with the drivers to get their things done (e.g. draw a cube on the screen, play a song, request a HTTP page).

Synon wrote:4. Is this a practical use of hardware rings? If not, can you suggest a better layout?

Most newer kernels don't use ring 1 and ring 2 and basically only use ring 0 (supervisor mode) and ring 3 (user mode). Other architectures such as ARM don't even support these "sub-modes" and only provide a supervisor and user mode. That, however, doesn't mean you can't use ring 1 and ring 2, it's just less common to do so.

Regards,
Stephan J.R. van Schaik.

Posted: **Thu Apr 29, 2010 1:27 am**

@quanganht,
What? No, I misinterpreted what Combuster said:

Combuster wrote:As for the rings, they are unsupported in 64-bit mode, and you don't have the isolation between processes and drivers individually which forms the basis of a microkernel.

Also, I know I have more reading to do (although, I've read all the pages you linked before, except the Wikipedia link, and I know what mutual exclusion is). I was merely asking a few questions.

@gerryg400,
1. Ok, thanks. I guess as I only get to choose between two privilege levels in long mode I'll have to have
Oh and yes, by "in the kernel", I meant "in kernel mode".
2. Yes, you're right. I thought there'd be some overhead. I'm reading Operating Systems: Design and Implementation and that got me interested in microkernels in the first place (although I think Tanenbaum is a little overzealous and I also think some more stuff perhaps could be done in kernel space).
3. I was thinking those would be the bare minimum (I think I missed some stuff out though) of kernel-provided syscalls that would be needed by drivers (if you'll notice, I wrote down what privilege level they would be callable from (before Combuster told me I couldn't use rings 1 and 2 in long mode)). Surely you can't implement memory access in userspace? Doesn't paging stop that? I thought paging stopped

@StephenVanSchaik,
How can all of those things be done in userspace? Surely some of them have to be done by the kernel?

@Combuster,
Thanks for the info on AMP vs. SMP. I guess I'll go with SMP.

@Topic,
If the syscalls I mentioned were incorrect, could someone tell me what ones absolutely must be done in the kernel? I want to make my microkernel as micro as I can.

Posted: **Thu Apr 29, 2010 2:49 am**

Your microkernel needs to provide

IPC (incl. some way to send pages of memory around)
Process management

To drivers, it also needs to provide

Some way to request delivery of interrupt notifications (perhaps to an IPC port)
A way to map in locations of the physical address space (For MMIO registers and such), and the ability to set the right cache flags on them
On x86, some way of specifying I/O port access

To some extent, memory management can be done in userspace: Provide the root task with all the free memory in the system, and it is then responsible for handing it out via IPC (You could perhaps, like L4, use a special form of kernel-handled IPC to handle page faults). It is also responsible for providing it to the kernel as required for the kernel to operate (Though your kernel should only require memory to do things like create processes and sockets, never in response to, say, an interrupt, as the system may be out of memory at the time!)

The kernel needs to provide next to nothing else: That is all the important stuff.

Posted: **Thu Apr 29, 2010 5:38 am**

3. I was thinking those would be the bare minimum (I think I missed some stuff out though) of kernel-provided syscalls that would be needed by drivers (if you'll notice, I wrote down what privilege level they would be callable from (before Combuster told me I couldn't use rings 1 and 2 in long mode)).

Think of your microkernel as the system arbitrator. Your ring3 driver asks the kernel for hardware access (system call) and then accesses the hardware directly. You don't need or want to be going thru the kernel for the actual hardware access. It's too inefficient.

Surely you can't implement memory access in userspace? Doesn't paging stop that? I thought paging stopped

It depends how you define userspace. If you define userspace as everything outside the kernel then you definitely can have a memory manager in userspace. However, to make things simpler, its better if the memory manager runs in ring2, 1 or 0. Then it has supervisor memory access rights.

Rings 0, 1 and 2 all have equivalent memory access rights in 64bit mode. So you can run a memory manager in ring1 and it can still access the entire address space.

You can even have a user space interrupt routine to handle page faults, you can do fork, spawn, process loading, etc all from ring1.

- gerryg400

Posted: **Thu Apr 29, 2010 6:18 am**

@Owen,
Thanks! I can make a new plan now.
@gerryg400,
Ok. So I have something in my scheduler's process table that says what each process can do, and have a syscall that changes that variable to signify that the process gets to access hardware directly?

Edit: something like

Code: Select all

/*
 * +-----+---------------------------+
 * | Bit | Permission granted if set |
 * +-----+---------------------------+
 * |  0  | Memory access             |
 * +-----+---------------------------+
 * |  1  | Graphics access           |
 * +-----+---------------------------+
 * |  2  | Disk access               |
 * +-----+---------------------------+
 * |  3  | Something else            |
 * +-----+---------------------------+
 * |  4  | ...                       |
 * +-----+---------------------------+
 * |  5  | ...                       |
 * +-----+---------------------------+
 * |  6  | ...                       |
 * +-----+---------------------------+
 * |  7  | ...                       |
 * +-----+---------------------------+
 */
struct process {
        /* ... */
        uint8 flags;
};

?

Also, I just noticed all the unfinished sentences in my previous post. I think I need to get more sleep :S

Posted: **Thu Apr 29, 2010 8:43 am**

Synon wrote: @StephenVanSchaik,
How can all of those things be done in userspace? Surely some of them have to be done by the kernel?

No, I mentioned what system calls the kernel should provide, that means they're called in user mode and handled in supervisor mode.

Synon wrote:@Topic,
If the syscalls I mentioned were incorrect, could someone tell me what ones absolutely must be done in the kernel? I want to make my microkernel as micro as I can.

I provided a general overview of the system calls, but you misinterpreted them for being handled in user mode.

Synon wrote:@gerryg400,
Ok. So I have something in my scheduler's process table that says what each process can do, and have a syscall that changes that variable to signify that the process gets to access hardware directly?

Generally the three things you definitely want to prevent, or better yet limit, are memory access, I/O access and perhaps BIOS access. Paging prevents an application from using memory they don't own. I/O access can be limited by filling in the IO-map in the TSS. BIOS access, if using emulation (including VM86-mode), can be limited in the system call itself, since you'll have a system call that can execute a BIOS-interrupt. There are a few other things you can prevent being executed in user mode as well (e.g. some instructions), but the Intel/AMD manuals might provide a better overview on what you can prevent from happening in user mode.

Regards,
Stephan J.R. van Schaik.

Posted: **Sun May 02, 2010 11:15 pm**

About the scheduling issue : when designing my own microkernel, I thought that user-mode processes are not useful if they're not created equal (except for some security permissions that should only have a fairly limited range of uses). This means noticeably
-Scheduling in the kernel (because otherwise a user process is not scheduling-bound has power over every single other).
-Security token handling in the kernel (because otherwise it means that a user process can grant itself insanely powerful rights and get power over every single other)

However, it's just a trick that helps me taking design decisions, which is PROVIDED AS-IS, WITHOUT ANY WARRANTY OF etc etc

OSDev.org

Microkernel design

Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design

Re: Microkernel design