OSDev.org

Posted: **Tue Aug 16, 2005 10:53 am**

About round robins and linear-vectors: I was referring to my last comment on the topic in my post, not the first:

Argh! No! Round robins do not offer the control that linear-vectors give, and they are not the same. You rotate around a list of timeslices, not tasks.

Also, they are not equal just because they both cycle thru a list. I suppose you could say the linear-vector works on the timeslices in a round-robin fashion. This does not make it equal to a traditional round-robin. As I said (and I hope you can agree with):

In a round-robin scheduler, there is no defined point at which tasks will run or for how long. The defined part is order of tasks. The linear-vector of timeslices defines when and how long, because it queues timeslices, not tasks.

So I suppose I was just trying to distinguish between a round-robin working on tasks and a round-robin working on timeslices, calling the task one a round-robin (since that's normally what one means by a round-robin) and the timeslice one a linear-vector. Grouping them all as round-robin doesn't work because their functionality actually ends up being quite different, as noted above.

About halting: A normal system would not be dealing with timeslices at all; it would be dealing with tasks. So when there are no tasks to run, it would halt. That's all I meant.

About user-schedulers being required by Xok: I don't think this would be so, since I don't see anything that a user-scheduler is going to provide that a good libOS won't. I was actually trying to avoid external managers by trying to develop a logical system for determining shares and such in-kernel.

About improving what ain't perfect: I think the issue is that I don't see what you're trying to do as an improvement. So I see a good system with no better system being proposed. Hence comes my "don't fix what ain't broken". And I still hold to hobbyists making more mistakes than people with more experience. I also smell hubris when I hear "I can do it better" with no supporting numbers.

About realtime: I think the Xok design has the potential to efficiently allow hard realtime for the tasks that need it while allowing for a full processor load, since it is known what will be happening during all timeslices. Also, since you know where the timeslices will be, a task can easily determine whether it will be able to meet its deadlines. For example, if another realtime task is in too many critical locations, the newly starting task would realize it can't meet its deadlines without trying and failing. As I too am no expert on realtime, I could be missing some things.

So in total, I think the problem is that I don't understand why you think the Xok design is too low-level to work, and thus why it needs improvement. So the core question: Why do you think the Xok design is too low-level to work?

Pax tecum,
Qui te vexat.

Posted: **Tue Aug 16, 2005 11:08 am**

Colonel Kernel wrote:A "linear vector of timeslices" makes a small amount of intuitive sense, but not much more than that.

In what ways does it not make sense? Perhaps I can help

.

Colonel Kernel wrote:Can either of you provide an example of how a libOS might implement priority round-robin (for example) on top of Xok's mechanism...?

I think with Xok's mechanism, there is no reason for a priority round-robin, since the applications effectively schedule themselves, or more likely, the apps' libOSes do the scheduling, possibly with some input from the app.

Pax tecum,
Qui te vexat.

Posted: **Tue Aug 16, 2005 11:32 am**

Ok, let me put it another way. Let's say I want to port Linux to an exokernel (maybe not a good idea, but that's beside the point -- I'm trying to understand the Xok mechanisms). Are the mechanisms in Xok sufficient to re-implement scheduling that is both comfortable and familiar to me?

If so, how does it work? The problem is that it isn't explained very well in the papers I've read so far...

Posted: **Tue Aug 16, 2005 1:28 pm**

As one big Linux server running as an app on top of the exokernel? In that case, when scheduling, Linux could choose a process to run and yield() to it. This is the same as for any user-mode scheduler, which are supported by Xok (although I personally would avoid it, it seems like just extra overhead to me).

Posted: **Tue Aug 16, 2005 1:36 pm**

@Colonel Kernel

Colonel Kernel wrote:Can either of you provide an example of how a libOS might implement priority round-robin (for example) on top of Xok's mechanism...?

The scheduling mechanism in Xok mainly serves to multiplex the CPU between external schedulers. If you want to provide a different policy, you'd normal just give a certain share of the system's time-slices to a user-space scheduler. This scheduler will then be called on all of these time-slices, can run whatever policy it wants and then pass control to one of its apps using the yield() systemcall.

As you might have noted I'm not a big fan of this idea:
[*]What's the point of having such a great multiplexing system if it any policy different from "linear-vector" has to be done in user-space schedulers anyways. Xok could aswell just use privileged tasks as in L4 which is much simpler and saves processing time. The original idea was to facilate the implementation of new policies and there are a lot of others that do make sense, especially on expert systems (eg realtime).
[*]Isn't it a bit too complicated to deal with absolute values ? I've tried to point out some of the problems that will arise in my first few posts on this topic and, as I still think that it all comes down to percentages anyways, I'm wondering whether it wouldn't be easier to go this way right from the start.
[*]Who decides in systems with multiple external schedulers how exactly the time-slices are divided. Of course each scheduler has to get 1/n th of it, but simply dividing the slices into n blocks won't do as the scheduler couldn't ensure any response time like that (scheduler A controls 100 ts, then scheduler B+C run for 100 ts each and make any latency consideration pointless). As the linear-vector also defines the exact position of the slices, one would have to decide about a maximim quantum lenght in advance which would make the conflict between interactive task and batch tasks even worse and this decision can't even be changed later without reallocating all time-slices.

Colonel Kernel wrote:Let's say I want to port Linux to an exokernel (maybe not a good idea, but that's beside the point -- I'm trying to understand the Xok mechanisms).

It's not as unrealistic as you might thing ! The L4ka project has adapted a standard linux kernel to run on top of L4 to prove that the performance loss wouldn't be that big and that an exokernel can also get some real work done.

Colonel Kernel wrote:And how does Xok's mechanism relate to scheduler activations?

It doesn't - at least not directly which is a pity in my opinion as the idea made some sense to me. Maybe I'd however be possible to implement something similiar on top of the Xok mechanism as it exposes the revocation of the CPU to the application. If a thread's time-slice has ended, Xok doesn't switch automatically but rather sents it a message (upcall..) informing it about the event and giving the thread the chance to save its registers as far as necessary. Only then the task is switched and once it gets scheduled again, it is also informed so that it can reload the old regs from the stack. These two points should be sufficient for a user-level thread scheduler to be constructed as they would allow it to take system-wide scheduling decisions into account (the scheduler hooks these upcalls). In a multiprocessor some more hacks would be necessary but it should be feasible, although the system could never be as clear as the "scheduler activations" scheme..

regards,
gaf

Posted: **Tue Aug 16, 2005 1:37 pm**

@QuiTeVexat

QuiTeVexat wrote:Also, they are not equal just because they both cycle thru a list. I suppose you could say the linear-vector works on the timeslices in a round-robin fashion. This does not make it equal to a traditional round-robin.

If the linear-vector does not belong to the category of round-robin scheduler, I insist on not using lottery scheduling but gaf-scheduling, which is something elementary different. After all I also added some of my own ideas to it..

Can we now please settle this ? I think we both know what is meant and this discussion doesn't lead anywhere.

QuiTeVexat wrote:So when there are no tasks to run, it would halt. That's all I meant.

If there are no other active task, there's still the idle task which can't be shut down and is in charge of halting the system. If this task wasn't there, the system would probably just continue walking through the list of tasks until one of them eventually gets active again. This can also be seen as some kind of waiting, but the CPU wouldn't have halted.

QuiTeVexat wrote:About user-schedulers being required by Xok: I don't think this would be so, since I don't see anything that a user-scheduler is going to provide that a good libOS won't.

The only policy you can build directly on top of the XOk kernel without having to use an external manager is "linear vector" and there are certainly a lot of apps which would prefere having a different policy. I always thought that exokernels where meant to leave the decision about which policy should be used to the apps..

QuiTeVexat wrote:I also smell hubris when I hear "I can do it better" with no supporting numbers.

How comes that there are only practical guys on this board that need everything proven by some excesive benchmarks, possibly on three different systems (real-time, server, desktop) with real apps that were ported from linux ? Numbers don't prove anything as things can always be interpreted as you wish them to be (have you ever listened to politicians?). Just use your common sense and you won't need all these benchmarks..

QuiTeVexat wrote:So in total, I think the problem is that I don't understand why you think the Xok design is too low-level to work, and thus why it needs improvement. So the core question: Why do you think the Xok design is too low-level to work?

The system is too low-level to support policies other than "linear-vector" without external schedulers and therefore may only serve to seperate these managers. For this task, however, it's by far to high-level as a single "privileged"-bit in the TCB (as used in L4) would be sufficient.

I've already enumerated some of the most important points at the beginning of the preceding post..

regards,
gaf

Posted: **Tue Aug 16, 2005 1:57 pm**

gaf wrote: If there are no other active task, there's still the idle task which can't be shut down and is in charge of halting the system. If this task wasn't there, the system would probably just continue walking through the list of tasks until one of them eventually gets active again. This can also be seen as some kind of waiting, but the CPU wouldn't have halted.

So the "built-in idle task" is nothing new. Walking thru the list of tasks without halting is a waste of power. You normally only walk thru the ready-list anyway in a task round-robin, so when its empty you would just halt.

gaf wrote: The only policy you can build directly on top of the XOk kernel without having to use an external manager is "linear vector" and there are certainly a lot of apps which would prefere having a different policy.

Give an example of different policies.

gaf wrote: How comes that there are only practical guys on this board that need everything proven by some excesive benchmarks, possibly on three different systems (real-time, server, desktop) with real apps that were ported from linux ? Numbers don't prove anything as things can always be interpreted as you wish them to be (have you ever listened to politicians?). Just use your common sense and you won't need all these benchmarks..

Because numbers *do* prove things; "common sense" does not.

gaf wrote: The system is too low-level to support policies other than "linear-vector" without external schedulers and therefore may only serve to seperate these managers. For this task, however, it's by far to high-level as a single "privileged"-bit in the TCB (as used in L4) would be sufficient.

Again, give me some examples of differing policies.

I do not understand why you think external schedulers would be needed.

Pax tecum,
Qui te vexat.

Posted: **Tue Aug 16, 2005 2:15 pm**

If you're confusing each other, it's no wonder I'm confused. ;D

I'll try to characterize what I think the difference is between linear vector and round-robin and you let me know if I'm on the right track...

In round-robin, the position of tasks in the queue depends entirely on which tasks block and when. In linear vector, the position of tasks in the "queue" is fixed based on which slices each task requests.

My understanding is that Xok never blocks tasks on its own -- instead a task will request to be put to sleep until some wake predicate is satisfied (or perhaps until a timer tick happens...?). Therefore the usual round-robin dynamics cannot apply.

So this begs the question (perhaps gaf asked it first? I've lost track): What happens to the current timeslice if a task decides to sleep? My guess is that if the current timeslice is X, then Xok simply decides by fiat that the current timeslice is now X + 1, and schedules the task that has requested every X + 1 slice to run (assuming it is runnable). In other words, timeslices that are periodic in the vector need not be periodic in real time. Also, I'm assuming that the vector can grow or shrink depending on how many tasks there are. Is the time-slice length itself fixed in Xok...?

Am I right, or am I smoking the good stuff?

On benchmarks, hubris, and all that.... Every time researchers come up with a brilliant idea, it solves some problems and creates some new ones. I don't think it's hubris to want to solve the new ones -- that's how we advance the state of the art.

Posted: **Tue Aug 16, 2005 3:04 pm**

Colonel Kernel wrote: In round-robin, the position of tasks in the queue depends entirely on which tasks block and when. In linear vector, the position of tasks in the "queue" is fixed based on which slices each task requests.

My understanding is that Xok never blocks tasks on its own -- instead a task will request to be put to sleep until some wake predicate is satisfied (or perhaps until a timer tick happens...?). Therefore the usual round-robin dynamics cannot apply.

Thank you. Sounds about right to me.

Colonel Kernel wrote: So this begs the question (perhaps gaf asked it first? I've lost track): What happens to the current timeslice if a task decides to sleep? My guess is that if the current timeslice is X, then Xok simply decides by fiat that the current timeslice is now X + 1, and schedules the task that has requested every X + 1 slice to run (assuming it is runnable). In other words, timeslices that are periodic in the vector need not be periodic in real time. Also, I'm assuming that the vector can grow or shrink depending on how many tasks there are. Is the time-slice length itself fixed in Xok...?

If a task returns the rest of a slice to the kernel, the only reasonable behaviour I can think of is to halt or donate it to some other task that asked for free slices until the end of the timeslice. Otherwise the timing of slices is disturbed, and another task could have been depending on that timing.

If the vector needs to grow, you can multiply its size by some number and split up the current slices proportionally.

I think the timeslice length in Xok is fixed, being the time between timer ticks. You can allocate more than one in a row if desired. If you wanted to adapt the system to have variably-sized timeslices, you could have instead an ordered list of slices, and just ensure that they don't overlap. Maybe have it operate like a delta queue... I'll have to think about it.

Colonel Kernel wrote: On benchmarks, hubris, and all that.... Every time researchers come up with a brilliant idea, it solves some problems and creates some new ones. I don't think it's hubris to want to solve the new ones -- that's how we advance the state of the art.

It's not hubris to want to solve them. It's hubris to think you can immediately do better than people who have worked on it more and have far more experience than you based on nothing more than your "common sense."

Posted: **Tue Aug 16, 2005 10:46 pm**

QuiTeVexat wrote:If a task returns the rest of a slice to the kernel, the only reasonable behaviour I can think of is to halt or donate it to some other task that asked for free slices until the end of the timeslice. Otherwise the timing of slices is disturbed, and another task could have been depending on that timing.

So timeslices actually are periodic in time, not just in terms of scheduling order? That makes things awkward, but I can see how it allows for stronger guarantees.

If tasks can just ask for free slices whenever, doesn't the situation degenerate to the round-robin-tasks case then...? Or is this when I should go back and read all those other posts more carefully?

A higher-level question now. The exokernel research so far has shown the benefits of letting apps control their own memory, disk blocks, and packets... Has it shown the same benefit for letting apps schedule themselves with this linear vector scheme? I'm wondering if perhaps the idea that all resources are inherently worth policing in applications rather than the kernel is taking things a bit too far. From "Fast and Flexible Application-Level Networking on Exokernel Systems", Section 8 "Discussion":

Process Scheduling. Xok?s process scheduling mechanism does not explicitly consider interactive responsiveness. With application-level networking, this can have a negative impact on packet processing latencies. Although Xok?s scheduler will give the CPU to an awakened process whose turn it is to run, it will not preempt another process?s turn. Although this is more fair for processes that care when they run, it does create performance problems for some forms of ping-pong communication. Appropriate CPU scheduling support for application-level networking remains an open area for research.

If it causes problems for networking, I can see how it would cause problems in other cases as well.

@gaf:
Does your lottery scheduling idea fix this problem?

Posted: **Tue Aug 16, 2005 11:01 pm**

A quick BTW, just in case anyone else is still following along. ;D

I found one compelling assertion in the Engler paper that, in my mind at least, answers the question of why we should care about exokernels regardless of the "flexibility argument" (from engler95, Section 2.3 "Library Operating Systems"):

Finally, the number of kernel transitions in an exokernel system can be smaller, since most the operating system is running in the address space of the application.

What if exokernels actually place the kernel-mode/user-mode boundary at the optimal spot? That would be pretty cool... Ridiculously difficult to prove, but a pretty seductive idea.

Posted: **Tue Aug 16, 2005 11:34 pm**

Colonel Kernel wrote:
So timeslices actually are periodic in time, not just in terms of scheduling order? That makes things awkward, but I can see how it allows for stronger guarantees.

If tasks can just ask for free slices whenever, doesn't the situation degenerate to the round-robin-tasks case then...? Or is this when I should go back and read all those other posts more carefully?

Allowing for applications to put themselves on a list to automatically request newly freed slices was an idea I had to try to get free slices to compute-bound tasks as soon as possible, as well as reduce user-kernel traffic and scanning for slices. As far as I know, it's not used in Xok. I don't think it degenerates to round-robin-tasks because we're still working with timeslices, not tasks.

Colonel Kernel wrote: If it causes problems for networking, I can see how it would cause problems in other cases as well.

Yes. Refusing to preempt can reduce responsiveness to events. Any system allowing for preemption would help with this. So allowing Xok to preempt, or using a more usual scheduling system would both work. With any preemption though, the guarantees are less solid. Hmm, to deal with all these things... *waltzes away to thought land*

Pax vobiscum,
Qui te vexat.

Posted: **Wed Aug 17, 2005 6:40 am**

QuiTeVexat wrote:Give an example of different policies.

[*]Round Robin with priority classes
[*]Earliest Deadline First
[*]Lottery scheduling
[*]Numerous algorithms for batch systems

check out wikipedia/google for more..

QuiTeVexat wrote:Because numbers *do* prove things; "common sense" does not.

I'm sure your president gets numbers presented by his generals every day proving that everything work perfectly well in Iraq..

Numers don't prove anything as they're biased, even if you don't intend this. The choice of how to benchmark and which programms to use already has a big influence on the numbers and one can thus determine the result of the benchmark in advance.

Don't just believe what any politician/exokernel-god/general says, use your own brain..

QuiTeVexat wrote:It's hubris to think you can immediately do better than people who have worked on it more and have far more experience than you based on nothing more than your "common sense."

What do you meant by "immediately" ? I've delved into exokernel design for the past half year and I do think that I by now have some idea of what's going on. I've told you before that they're just ordinary students/profs and that there's no need to worship them..

Engler on his website:
"I am a joint EE/CS professor. Before that, I was an irresponsible graduate student in Frans Kaashoek's PDOS group at MIT's Lab for Computer Science, where I co-founded the exokernel operating system project, which formed the basis of my thesis work"

regards,
gaf

Posted: **Wed Aug 17, 2005 6:46 am**

Colonel Kernel wrote:What happens to the current timeslice if a task decides to sleep? My guess is that if the current timeslice is X, then Xok simply decides by fiat that the current timeslice is now X + 1, and schedules the task that has requested every X + 1 slice to run (assuming it is runnable).

If a task is scheduled for time-slice x it may not run any earlier than that even if the processor is idle. This is why I've come to the solution that in such a system tasks always have to donate their time-slices to other tasks, which then run for them. Due to the server/client architecture of the system this is possible most of the time, but it - for example - doesn't work for a driver that blocks on I/O. Once the driver has sent its request, it therefore has to free its time-slices to allow other task to get hold of them.
What makes the whole thing even tricker is that all the driver's time-slices have to be in use before it gets scheduled again, otherwise the CPU will just halt, and that there's no real system how the task divide the time-slices among them. As soon as the device has finished its work and triggers an IRQ, the driver has to get back its old time-slices, which might cause additional problems as the tasks now holding these slices may depend on them.

Colonel Kernel wrote:Also, I'm assuming that the vector can grow or shrink depending on how many tasks there are. Is the time-slice length itself fixed in Xok...?

If the vector had a variable size it couldn't any longer be guaranteed that a time-slice is scheduled at a certain instance (this also goes for multipying the size..).

Colonel Kernel wrote:If it causes problems for networking, I can see how it would cause problems in other cases as well.

Does your lottery scheduling idea fix this problem?

The problem could probably be solved with priorities which I planned to support in my system, although not in a very fine-gained way:
[pre]o------------------------o-----------------o
| drivers - 70% | groups - 30 % | 1st level
o------------------------o-----------------o

o-------------------o----------------------o
| userA - 50% | userB - 50% | 2nd level
o-------------------o----------------------o[/pre]
As long as all drivers are idle, the groups get 100% of the CPU, which means that both users get 50% each. Once a driver get gets active, a lottery is held in which it has a 70% chance of winning so that that it would get scheduled immediately. The groups however also have a 30% chance, which ensures that, even when the drivers are very busy, user tasks won't starve completely.

A similiar scheme could improve responsiveness for user-level networking if the tasks of each user are further divided into two or more groups - interactive (level 3) and batch (level 4).

I believe that my lottery approach could overcome most of the implementation related issues of the "linear-vector" scheduler, but the real problem, which is that other policies can only be supported in user-space schedulers, isn't solved. I'm also a bit sceptic that it will work as good in different systems like realtime or batch as new requirements for the schedulers might arise - one I could imagine is to allow deadlines to be taken into account. As it's not feasible to support all possible policies in a mechanism so that no external schedulers are needed, I'm right now working on ways to extend the scheduler activations idea to the whole system and thus make it more flexible.
For the root scheduler some simple round-robin/lottery might be sufficient as it only manages a few user-level schedulers which can run their own apps that in turn may run their own tasks. The whole process would be exposed to the lower systems using the scheduler activations scheme and thus allow scheduler policies to be constructed efficiently.
This is all just some idea and I'm still a bit worried about real-time task, but I do think that it would be an improvement to at least have a well defined mechanism to transfer control to the user-level schedulers.

It's by the way interesting to see that even the official papers admit that there are some problems with Xok's scheduling..

regards,
gaf

Posted: **Wed Aug 17, 2005 9:52 am**

About the policies: None of those are necessary when applications can schedule themselves.

>> EDIT: I misinterpreted gaf's statement. Ignore this.
About Bush: That comment is off-topic and below-the-belt. I am here to talk about operating systems, not politics.

About Numbers: Numbers prove things much more than common sense does. Yes, they are biased, but so is common sense and to much more of a degree. And no, I do not just believe what they tell me; I think about it and try to get my own numbers before I decide another way is better. And I *am* using my own brain, otherwise I would have said "oh wow you're right gaf!" a long time ago.

Yes, people are imperfect, and so are their ideas. I would still say that the designers and implementors of two exokernel systems over several years have the advantage of experience over both of us. That does not mean one should accept their ideas blindly. It *does* mean that their ideas are probably pretty good, and should be well-considered and have problems overwhelming before they are thrown out. You have not convinced me that there are problems overwhelming that should cause us to throw the ideas out.

No worship or immediate acceptance is going on here. I have looked over the idea and found it to be sound. You looked over the idea and found it to be not sound. I am trying to consider ways to fix any problems without throwing the baby out with the bathwater. You decided it was unworkable. We're both using our brains; the output just happens to be different. No reason to get testy.

About a variably-sized vector: The whole point of doing it in some multiple is so that you can proportionally expand the timeslices to multiple timeslices in the new vector. Timing would still be guaranteed.

Pax tecum,
Qui te vexat.

OSDev.org

Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management