Sheduler in User Level

mrkaktus · Post by **mrkaktus** » Wed Apr 26, 2006 11:00 pm

Hi,
I'm working on my OS (microkernel) and I'm trying to divide it in such way that it will have sheduling function in User Level (DPL-3), and it will be switching to DPL-0 only when exception occures (INT 0 to 31), or will be called some basic function that is required to do on physical ram or paging structures of some process (like sharing page; adding, cuting frames to ram).

So at the beginning of User Virtual Memory will be system data where will be directed all normal interrupts (when they will be called) or IRQ's. And after that block will be rest of memmory for user program.

Now I'm going to main question - how to support protection for that system data & procedures block at the beginning of UVM?

Basic goals are:

1) - Proces can only read form that block.
2) - Jump to this memory is forbidden (so process cannot call for example keyboard ISR).
3) - Execution can be swithed to this memory block only by interrupt.

It is possible to do this?

I was thinking about seting this pages to ReadOnly (1st point would be done by that). But I'm not shure if I set them Supervisor flag (that would do point 2) would I to run interrupt in that area then ?

Da_Maestro · Post by **Da_Maestro** » Wed Apr 26, 2006 11:00 pm

Interrupts must always run in supervisor mode so setting the supervisor flag on this memory will work.

I don't see why you would want to do task switching in user mode. Are you giving control to the task switching to the user mode applications, or are you having part of the kernel running in user mode? How do you plan to implement process blocking? What will happen if a user mode application enters an infinite loop before it has a chance to switch to a different task?

mrkaktus · Post by **mrkaktus** » Thu Apr 27, 2006 11:00 pm

I want to split my microkernel to several sub-systems / servers. So I want to have such server to manage processess and to have it in user level because by that I can gain speed (I'm writing also in asm to have it faster).

When sheduler will be in user level it could switch processes almost so fast like threads (only stack switch and CR3 reloading - ok , also writing and loading from Proces Description Block , but this will be fast even that). I think changing Priviledge Level is slow (how many tick's of processor it takes - 200?) so in this way I will save a lot of processor time

.

A) It will be user level part of kernel that will be maped to evry process memory.

B) Process will be terminated when occures one of interrupts, then control is given to sheduler on the beginning of virtula memory. Then it will switch CR3 to gain the rest of his memory (proces state is saved then), from the rest of mem it can take some other resorces to dispatch new process to run. Then it reloads registers , CR3 and makes iretd (everithing in thesame PL).

Only INT's from 0 to 31 will switch to DPL0 to main kernel (or low level calls form sheduler / other modules).

So If process is hanging themselve there is no problem for me because there will be always some IRQ that will switch to sheduler (for e.g. IRQ0).

gaf · Post by **gaf** » Thu Apr 27, 2006 11:00 pm

Hello MrKaktus

mrkaktus wrote:When sheduler will be in user level it could switch processes almost so fast like threads (only stack switch and CR3 reloading - ok , also writing and loading from Proces Description Block , but this will be fast even that). I think changing Priviledge Level is slow (how many tick's of processor it takes - 200?) so in this way I will save a lot of processor time .

Privilege Level changes are slow, but compared to the enormous costs of a TLB invalidation, which is necessary whenever the context is switched, their costs really don't matter all that much: After the CR3 register was reloaded the TLB has to be flushed to get rid of the old mappings. Until it's filled again with the working-set of the new task, almost every memory reference will cause a TLB cache miss. This means that the cpu has to look-up in the page-tables which physical-address corresponds to the virtual address that is to be accessed before it can actually access memory.

Apart from that I'm wondering how you want to reload CR3 from user-space anyway..

mrkaktus wrote:Then it will switch CR3 to gain the rest of his memory (proces state is saved then), from the rest of mem it can take some other resorces to dispatch new process to run.

As context switches are so expensive you should really try to avoid them at all costs. Most operating systems thus map the whole kernel space to the task's address-space. Since you have 4GB of virtual memory available there really shouldn't be any problem in sparing a sufficietly large portion for your kernel (especially if its a ?-kernels

).

mrkaktus wrote:Only INT's from 0 to 31 will switch to DPL0 to main kernel (or low level calls form sheduler / other modules).

All interrupts between 0-31 are either used for exceptions or intel reserved. Apart from that it - in my opinion - doesn't make much sense to use multiple interrupt for systemcalls. If you restrict yourself to one common interrupt vector you might need some additional dispatcher code - which could cost a cycle or two - but on the other hands it also enables you to use the much more efficient sysenter/sysexit mechanism that can really make a difference when it comes to systemcall performance.

regards,
gaf

mrkaktus · Post by **mrkaktus** » Fri Apr 28, 2006 11:00 pm

You write that you are wondering how I will make reloading of CR3 in DPL3. It is impossible :> ? I think there must be some way to do that because I read about microkernel that has such sheduling. In Intel manuals they write that MOV CRn is protected form aplication it is realy a problem but maybe in Supervisor Memory Area in CPL=3 I can do it?

Yes you have right, I think sheduler can have all its code maped to user space virtual memory and really didn't need to switch additional VM space for the rest of the data (good point of view gaf

).

About Interrupts from 0 to 31 I know they are Intel exceptions. This is why I want to support them in DPL-0

. I'm thinking to have 2 system INT's - one that will be only for subsystems use to e.g call for creating paging structure for process , and other low-level things (it will go to DPL-0). And second will be main system API that will provide all functions for processes and will be executed in DPL-3.

gaf · Post by **gaf** » Fri Apr 28, 2006 11:00 pm

mrkaktus wrote:In Intel manuals they write that MOV CRn is protected form aplication it is realy a problem but maybe in Supervisor Memory Area in CPL=3 I can do it?

From ring3 you couldn't even access a supervisor page - that's actually the whole point of protecting it. Just imagine what would happen if a ordinary user-mode was able to switch its context: A malicious task could use this to create its own paging-directory and thus to access all physical memory without any protection.

mrkaktus wrote:I think there must be some way to do that because I read about microkernel that has such sheduling.

You can move the scheduling policy to a user-space module, but the mechanism itself must reside in kernel space. A very simple method would be to have the kernel call the user-space manager every time a clock-tick occures. The external scheduler may then decide which task should run next and informs the kernel about it, which then switches to the task. The biggest advantage of this approach are its conceptual simplicity and the huge liberties the external scheduler has. Unfortunately it does causes some overhead as two extra privelege-level transitions and one context switch become necessary:

monolithic: timer_irq -> kernel's internal scheduler -> task
external : timer_irq -> kernel -> external scheduler -> kernel -> task

Another approach would be to provide some privileged system-call that allows the external scheduler to set up the scheduling order in advance. This means that it can directly influence how the kernel's internal scheduler will work. Provided that the kernel scheduler uses some basic round rapping mechanism, the external scheduler may for example choose the time-slice length and the ordering of the tasks. It is vital for this approach that the internal scheduler is basic enough to allow the user-space scheduler to really control the policy. Unfortuanately it's also inherently difficult to find the right mechism for the kernel scheduler and, even if you were to succeed at developing a sufficiently well performing algorithm, you could probably never reach the flexibility of the first approach. On the other had this method does provide an optimal performance since it's just as efficient as a a monolithic scheduler.

mrkaktus wrote:About Interrupts from 0 to 31 I know they are Intel exceptions. This is why I want to support them in DPL-0 .

But shouldn't especially exceptions be delivered to the user-space application itself ? After all there's in general little a kernel, and a ?-kernel even more so, could do to handle the exeption: A page-fault for example has to be handled by the external pager and a lot of other exceptions, like the divide by zero fault, can only be handled by the application itself. In general the program knows best what it's doing and may thus also react to faults that the kernel couldn't interpret.

mrkaktus wrote:I'm thinking to have 2 system INT's - one that will be only for subsystems use to e.g call for creating paging structure for process , and other low-level things (it will go to DPL-0). And second will be main system API that will provide all functions for processes and will be executed in DPL-3.

Almost all ?-kernels actually use some IPC mechanism for what you want to do with your second system-call. This means that the applications may send messages to all kinds of servers that offer a variety of services like paging, scheduling or device management. The kernel doesn't have to know about the contents of these messages, but only make sure that they get delivered properly. The advantage is a much more flexible API interface that doesn not rely on some hard-wired calls to the kernel, but may be defined by each server as necessary.

regards,
gaf

mrkaktus · Post by **mrkaktus** » Sat Apr 29, 2006 11:00 pm

I'm not shure if you understand me properly.I want the user-level program space look like this (DPL3 segment):

0 e.g 4MB 4GB
| sheduler | user program . . . . user stack at end |

| superwisor | user |
| read only | read/write |

You write:
"From ring3 you couldn't even access a supervisor page - that's actually the whole point of protecting it."

But isn't it correct ?:

Now Interrupt 0x20 ocures in process and it is located in
sheduler superwisor space where the execution is switched:

| o <-----|-------- user code |

Execution is now in Supervisor mode, read only and still DPL3
(has I right here? I can have interrupt which is running in DPL3,
and it can run in superwisor space ?).

Main problem - can I now do switch of DPL in superwisor space?

If YES, my path would look like this:
timer_irq ~> sheduler dpl3 ~> new task (no DPL change).

If answer is NO i will still have something like this:
timer_irq ~> sheduler dpl3 -> kernel -> new task (2 DPL change).

Yes, you have really right about that exception's I will need to think about them later again (when I will write them support). And the second system INT is solution untill I will write working IPC

.

gaf · Post by **gaf** » Sat Apr 29, 2006 11:00 pm

Paging always works in combination with the segmentation privilege-levels. You can therefore only run in a supervisor page if your CPL is between 0 and 2 - trying to call your scheduler with CPL being 3 will result in a page-fault:

"The segment privilege levels map to the page privilege levels as follows. If the processor is currently operating at a CPL of 0, 1, or 2, it is in supervisor mode; if it is operating at a CPL of 3, it is in user mode. When the processor is in supervisor mode, it can access all pages; when in user mode, it can access only user-level pages." (Intel Manual 3a - 4.11.2)

Apart from that I'm wondering why you want to set the supervisor area to read-only ? This would mean that the scheduler code couldn't write to any variables, and I've some problems conceiving a stateless scheduler..

I'm afraid that - at least in this case - there's really no alternative but doing it the traditional way: The scheduler runs in a supervisor page as a part of the kernel. Upon return it switches to ring3 and jumps to the user-level application's entrypoint.

regards,
gaf

carbonBased · Post by **carbonBased** » Sat Apr 29, 2006 11:00 pm

gaf wrote: Apart from that I'm wondering why you want to set the supervisor area to read-only ? This would mean that the scheduler code couldn't write to any variables, and I've some problems conceiving a stateless scheduler..

I'm afraid that - at least in this case - there's really no alternative but doing it the traditional way: The scheduler runs in a supervisor page as a part of the kernel. Upon return it switches to ring3 and jumps to the user-level application's entrypoint.

One thing you could potentially do, however, is have the scheduling 'policy' in a p3 task. I'm not sure if this would be acceptable to the original poster or not, though.

In this case, the scheduler continues to run in the kernel, at p0. However, the p3 task will tell the scheduler what the next active task should be. This would allow for swappable scheduling policies.

Keep in mind, however, that this is an extremely high level discussion -- there are *a lot* of issues to overcome if this will work (I'm not even sure it's worth it... it just popped in my head

). For example -- the scheduling policy is, in itself, a task. Who controls when it gets CPU time?

And this would mean that a task switch would only happen when the scheduler policy had assigned a new "next task," which it can't do unless it's received focus, which it can't do unless it's scheduled itself as the next task, etc, etc, etc... it seems the best approach would be to intersperse this scheduling task inbetween every other task switch. This makes this regular p3 task a special cased task, however... and also one that gets most of the CPU cycles. It is, however, a very quickly executing task that immediately yields to other threads once it's done.

Anyway, like I said, just a thought off the top of my head... not sure how good it is, but someone might be able to sculpt it into something useful.

Cheers,
Jeff

mrkaktus · Post by **mrkaktus** » Sat Apr 29, 2006 11:00 pm

With that read-only flag I was a little bit dizzy after reading all that stuff and thinking and just make logical mistake

. Ofc if I would set them page to supervisor level I don't need to protect them with read only flag.

Yes, I see that there are things on my way that I really can't force ;/.
It is sad that interrupt cannot swith execution form user to supervisor mode :[. I think everything is a fault of that segmentation old stuff, why it is still in PC?

Ok, I decide that I will write sheduler in DPL3 as a subsystem but I will need in such way to make it working like this (swithing only CPL , no paging):

irq -> (CPL3->0) sheduler (in beginnig of RAM) -> (CPL0->3) new executed task

And if I will to make something on paging structures of processes or
physical ram then I will call that 1st lov-lewel system INT wchich will switch like this:

process irq -> (CPL3->0) sheduler decides LOW irq ~> working on mem ~> bac to normal state -> (CPL0->3) dispatched new executing task

Where:
-> means switching CPL
~> means swithing paging structure

I think losing a performance on that part will allow me to gain it on fact that almost all ISR will be DPL3 (almost all INT's are going throught sheduler because when INT/IRQ occures my system can switch task execution to some other one).

gaf · Post by **gaf** » Sat Apr 29, 2006 11:00 pm

@Jeff:
You're of course right that there's still the possiblity to have the policy a user-space scheduler, and in fact that's actually what I'm planning to do in my operating system. In my second post in this thread I did mention a bit more about it..

@mrkaktus:
I don't think that you can really blame segmentation for it - in my opinion its more of a conceptual problem. Actually the whole segment based protection is more or less disabled anyway if you're using flat-mode with paging. What remains is however really necessary, and even architectures that don't know segments must somehow support the notion of privilege-levels.

I'm by the way not quite sure if I really understood your (new) scheduler design. On the on hand you say that it's supposed to be a DPL3 task, but on the other hand the two calling paths seem to imply that it runs in kernel mode. Are you planning to use two schedulers - external and internal ?

regards,
gaf

mrkaktus · Post by **mrkaktus** » Sun Apr 30, 2006 11:00 pm

Yes it will be sometching like that, I will try to divide it to gain some modularity and maybe speed. But after all that converstion I will need to re-think all that design and problem, because there shows a lot of new questiions and path's that could be choosen (like carbonBased say - it is very high level converstion). I will need to sit with my PC and make some new tests. I will post some new thinking - ideas here later

.

OSDev.org

Sheduler in User Level

Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level

Re: Sheduler in User Level