Minimalist Microkernel Memory Management

gaf · Post by **gaf** » Sun Aug 14, 2005 7:47 am

QuiTeVexat wrote:First, I'd like to note that even if there are tasks with allocated timeslice, there isn't necessarily any work to do. A task could have, for instance, intending to sleep, reserved a timeslice in the future at the time it wants to awaken, or have something set up on a polling interval. In that case, it needs no more timeslice than it requested, and should not be given extra timeslices it didn't ask for.

Such a task would block after it has polled for its data and therefore shouldn't get more than it wants in any other system either. In such a situation the free time-slices should be given to a idle task that can handle power management much better than the kernel ever could.

QuiTeVexat wrote:And there's nothing that keeps the other tasks from acquiring their fair share afterwards.

Do you really want the apps to fight for their time-slices ?!

QuiTeVexat wrote:Tasks would be informed when they receive timeslice by this method (possibly just by informing them of the timeslice number upon entry), so the allocation would be both exposed and requested.

Well.. that's nice, but what's the advantage for the tasks if they now know that they have been revoked a time-slice. After all there's nothing they have to save and the only way to get the time-slice back is to steal it again..

QuiTeVexat wrote:The driver has the right to do this because it is a higher priority task. If it is not, then it would have to wait for a lower priority task to begin executing.

Making decisions based on task priority directly in the kernel doesn't sound like such a good idea to me. And what happends if it's not a driver that wants to claim a TS but a normal app ? Who makes sure that the less privileged tasks won't starve ?

QuiTeVexat wrote:So we just need to come up with some clever ideas so we don't have to sacrifice the benefits we're seeking.

I don't doubt that it can be made to work, the question is just how many kernel hacks and user-level scheduler cooperation it will require. Lottery scheduling is in my opinion a much more natural approach as it doesn't try to multiplex the CPU like any other block-device, but in a way that is really appropriate for it.

regards,
gaf

QuiTeVexat · Post by **QuiTeVexat** » Sun Aug 14, 2005 12:26 pm

gaf wrote: Such a task would block after it has polled for its data and therefore shouldn't get more than it wants in any other system either. In such a situation the free time-slices should be given to a idle task that can handle power management much better than the kernel ever could.

How exactly would they be better handling power management than by doing nothing thru timeslices nobody wants? If you want a task to do something like this during free slices, write a task that allocates free slices and manages them, and give it lowest priority. Any tasks that want those slices can take them, and meanwhile the task can do whatever during those slices.

gaf wrote: Do you really want the apps to fight for their time-slices ?!

There wouldn't actually be too much fighting. Tasks know what they are entitled to, and so does the kernel. The word "fight" implies that they are trying to deliberately hinder other tasks. On the contrary, they would simply be requesting the resources they know they have claim to. If apps decided to "fight", the resources would end up being split perfectly upon quota lines, and the apps would stop.

gaf wrote:
QuiTeVexat wrote:Tasks would be informed when they receive timeslice by this method (possibly just by informing them of the timeslice number upon entry), so the allocation would be both exposed and requested.
Well.. that's nice, but what's the advantage for the tasks if they now know that they have been revoked a time-slice. After all there's nothing they have to save and the only way to get the time-slice back is to steal it again..

I was talking about the allocation of free slices in that quotation. An IRQ causing preemption would not be a permanent timeslice transfer, and in the case of a higher priority task causing revocation, having given a task higher priority implied that you trust that application not to step on the toes of lower priority programs in undesirable ways. If you trust those lower programs, up their priority. Giving them lower priority suggests that you want them to give up slices to higher priority tasks if required.

gaf wrote: Making decisions based on task priority directly in the kernel doesn't sound like such a good idea to me.

How else does the kernel decide from whom resources should be revoked when necessary? Priorities are a simple way to handle this. I am open to ways to work out the hierarchy of revokers and revokees better.

gaf wrote: And what happends if it's not a driver that wants to claim a TS but a normal app ?

What's the difference between a driver and a "normal" app that traps IRQs? A user-space driver is just an app that does different things than other apps.

gaf wrote: Who makes sure that the less privileged tasks won't starve ?

One, the fact that the higher priority tasks have been given that priority suggests a degree of trust in the program. The higher priority tasks won't consume all the resources. If needed, however, quotas could ensure that less privileged tasks won't starve.

gaf wrote: I don't doubt that it can be made to work, the question is just how many kernel hacks and user-level scheduler cooperation it will require.

Implementing the lottery scheduler would involve hacking up the kernel, too. Your system for Associating and Revoking timeslices isn't a kernel hack?

gaf wrote: Lottery scheduling is in my opinion a much more natural approach as it doesn't try to multiplex the CPU like any other block-device, but in a way that is really appropriate for it.

Since the timer ticks at a given interval, CPU time *does* consist of "blocks" of time, so it's natural to divide these between programs. As I said before, I'm hesitant to sacrifice the control and awareness that this gives to applications. Also, if you want a lottery scheduler to manage some block of time, you could build one on top. The low-level system gives you the control and awareness, if you want the abstractions, etc. build them on top. Unreplaceable abstractions have no place in an exokernel.

Pax tecum,
Qui te vexat.

gaf · Post by **gaf** » Sun Aug 14, 2005 3:53 pm

QuiTeVexat wrote:How exactly would they be better handling power management than by doing nothing thru timeslices nobody wants?

An idle task should provide power-management by, for example, starting a screen-saver, lowering the CPU clock or shutting down devices that aren't needed.

QuiTeVexat wrote:Tasks know what they are entitled to, and so does the kernel.

I could imagine this to work if the task all use the same libOS and therefore the same policy, but in a homogeneous system there are often apps that follow exclusive goals. Let's assume we have a batch task that wants to allocate long quanta and an interactive task that needs to run for only a short time at a high frequency. I'm afraid that in such a system the second task would try to split the batch task's long quanta to run more often, and the batch system would fight against this by revoking the interactive task's time-slices again. How does the batch task know that there's a good reason why it can't run for such long quanta ?

QuiTeVexat wrote:If apps decided to "fight", the resources would end up being split perfectly upon quota lines, and the apps would stop.

Wouldn't it then be easier to directly split the resource according to these quotas, skipping all the preceding "negotiations" ? Every other solution would in my opinion mean that tasks have taken more than they are entitled to..

QuiTeVexat wrote:How else does the kernel decide from whom resources should be revoked when necessary? Priorities are a simple way to handle this. I am open to ways to work out the hierarchy of revokers and revokees better.

Revocation should be avoided as far as possible, in your system it might take place several hundred times a second. In my opinion I'd be the best if only the task's creator could revoke a time-slice as it's also the one that has donated it.

QuiTeVexat wrote:Implementing the lottery scheduler would involve hacking up the kernel, too. Your system for Associating and Revoking timeslices isn't a kernel hack?

We obviously have a different idea of what a hack is: For me its an ugly workaround that is necessary due to certain limitations in the original design, according to your definition it seems to be something that differs from the official papers. The Xok design is however experimental and as such certainly far from perfect. If I think that something could be done better, I'll attempt to do so - after all copying the original design bit by bit won't bring any new insights..

QuiTeVexat wrote:Since the timer ticks at a given interval, CPU time *does* consist of "blocks" of time, so it's natural to divide these between programs.

Have you ever seen a task manager that say an app runs for X time-slices ? The value as such is absolutly meaning-less and this only changes if you see it as a certain fraction of the CPU time by comparing it to the other apps. Then it's however no longer just a time-slice..

CPU_share = frequency * quantum_lenght

The round-robin algorithm repeats the list of time-slices again and again at a certain frequency (that depends on the number of time-slices on the system) and thus gives it a real meaning. The result is exactly the same as in any lottery scheduler but the design is very awkward and makes things harder than they would have to be.

If you want a design that doesn't result in a CPU_Share value, the time-slices may only be used once: Apps constantly push time-slices into an execution queue and the CPU "consumes" one after the other by scheduling it.

QuiTeVexat wrote:As I said before, I'm hesitant to sacrifice the control and awareness that this gives to applications.

I don't think that a lottery scheduler would mean any more kernel policy than the traditional round robin-design..

QuiTeVexat wrote:Also, if you want a lottery scheduler to manage some block of time, you could build one on top.

Well, I'd also be possible to create a priority round-robin scheduler on top of lottery scheduling. Both algorithms are low-level enough to be mechanisms and the big advantage of lottery scheduling is that it really multiplexes.

regards,
gaf

QuiTeVexat · Post by **QuiTeVexat** » Sun Aug 14, 2005 10:22 pm

gaf wrote: An idle task should provide power-management by, for example, starting a screen-saver, lowering the CPU clock or shutting down devices that aren't needed.

Starting a screensaver should be done when there's been no user input for some period of time, throttling the CPU should be done by a power-saving task that works on a configuration the user gives it, and devices should be shut down by the driver when they haven't been in use for some period of time. None of these is something you'd do in an idle task.

gaf wrote: Let's assume we have a batch task that wants to allocate long quanta and an interactive task that needs to run for only a short time at a high frequency. I'm afraid that in such a system the second task would try to split the batch task's long quanta to run more often, and the batch system would fight against this by revoking the interactive task's time-slices again. How does the batch task know that there's a good reason why it can't run for such long quanta ?

Interactive tasks have higher priority than batch tasks. They can arrange their timeslices as needed, and the batch tasks can fit in between their slices. Since the interactive tasks have higher priority, their slices cannot be moved by the batch tasks.

gaf wrote: Wouldn't it then be easier to directly split the resource according to these quotas, skipping all the preceding "negotiations" ? Every other solution would in my opinion mean that tasks have taken more than they are entitled to..

Splitting a resource for the tasks takes control of resource management away from the tasks. The whole point of building an exokernel was to see what benefits could be had by allowing for the most control of resource management by tasks as possible, no? The only way to really resolve this linear-vector vs. scheduler is probably to just implement a few systems and try them out. And even then, it might not really be resolved.

gaf wrote: Revocation should be avoided as far as possible, in your system it might take place several hundred times a second. In my opinion I'd be the best if only the task's creator could revoke a time-slice as it's also the one that has donated it.

I don't see why it would take place several hundred times a second. Higher priority tasks would take their picks within the limits of their quotas, and then the lower priority tasks and on down would take what they wanted out of what's left. It's not like they could take resources from the higher priority tasks, and go back and forth revoking and allocating. There might be some revocations when a higher priority task realizes it needs a resource used by a lower priority task, but it's over after the resource is transferred. And you don't get rid of revocation overhead by hiding revocation.

gaf wrote: We obviously have a different idea of what a hack is: For me its an ugly workaround that is necessary due to certain limitations in the original design, according to your definition it seems to be something that differs from the official papers.

Sorry for the combative attack against your idea. I was gonna say "hack for getting better performance out of userland schedulers" but then I was unsure if that's really what you meant it for, and got rid of it without getting rid of the initial lead-in, too. I was also getting touchy about *my* stuff being called hackish. Sorry.

I think one of my issues was that from my point of view it would be adding unnecessary functionality/bloat/overhead/complication to the kernel, in order to achieve a slight unproven performance benefit. Also, it seemed slightly hackish because it seemed to be meant to workaround the design issue of not having a clear idea of who's entitled to what, forcing applications to have to hold on to timeslices for fear they'd be stolen never to be returned.

Anyway, if the ideas are "clever", like I said they'd be before, they wouldn't be hacks

. Also, you wouldn't be working around limitations in the design, you would be filling in the blanks that the vaguer over-all design didn't specify.

-- continues --

QuiTeVexat · Post by **QuiTeVexat** » Sun Aug 14, 2005 10:24 pm

-- continued --

gaf wrote: The Xok design is however experimental and as such certainly far from perfect. If I think that something could be done better, I'll attempt to do so - after all copying the original design bit by bit won't bring any new insights..

The exokernel architects (who have much more experience in this area than both you and I) repeated the design from Aegis to Xok, leading me to believe that it wasn't as terrible as you seem to think it is. I would say that it's not worth fixing what ain't broken. Of course, if you think you can do better, go right ahead.

About insights: If you're writing an OS because you new ideas you want to implement, this doesn't necessarily mean you want to change everything. Say you think the scheduler is fine, but you'd like to add user-space drivers. Why change the scheduler? Second, I think it would be useful to have something to compare to, so I know if new ideas are actually an improvement. Since I have not actually gone thru the full implementation of the linear-vector, I don't even have a full set of insights from that experience to compare.

gaf wrote: Have you ever seen a task manager that say an app runs for X time-slices ? The value as such is absolutly meaning-less and this only changes if you see it as a certain fraction of the CPU time by comparing it to the other apps. Then it's however no longer just a time-slice..

It's not meaningless to applications. They have access to the tick count and know exactly what a timeslice is.

gaf wrote: The result is exactly the same as in any lottery scheduler but the design is very awkward and makes things harder than they would have to be.

It's not the same. With the linear-vector, apps know *exactly* when and for how long they are scheduled to run. They are also aware of when and how long other apps are scheduled to run. They know if they can get the slices they want, etc.

gaf wrote: If you want a design that doesn't result in a CPU_Share value, the time-slices may only be used once: Apps constantly push time-slices into an execution queue and the CPU "consumes" one after the other by scheduling it.

A thought like this occurred to me while traveling today, only with a rotating list. I haven't thought too much on it, since it seems to me that an app is gonna want more or less the same pattern each time we go thru the vector. If it didn't, it'd change it.

gaf wrote: I don't think that a lottery scheduler would mean any more kernel policy than the traditional round robin-design..

Then we are at an impasse

. With the scheduler the kernel ends up taking control and awareness away from the applications. And the linear-vector is not a traditional round-robin design.

gaf wrote: Well, I'd also be possible to create a priority round-robin scheduler on top of lottery scheduling. Both algorithms are low-level enough to be mechanisms and the big advantage of lottery scheduling is that it really multiplexes.

The linear-vector is *not* a prioritized round-robin, and I can't think of a way you'd implement it on top of lottery scheduling, without it ceasing to be lottery scheduling anymore. The "on-top" scheduler cannot pass on guarantees it doesn't have to begin with.

Whew. This is getting a bit long, isn't it?

Pax tecum,
Qui te vexat.

gaf · Post by **gaf** » Mon Aug 15, 2005 8:45 am

QuiTeVexat wrote:Starting a screensaver should be done when there's been no user input for some period of time, throttling the CPU should be done by a power-saving task that works on a configuration the user gives it, and devices should be shut down by the driver when they haven't been in use for some period of time. None of these is something you'd do in an idle task.

Why shouldn't the idle-task be the power-saving task ? I agree that the devices should disable themselves, but in order to do so they need to know under which conditions this is desired by the user and as you'd not want power-managemant to be spread throughout the whole system, some central instance may make sense.

QuiTeVexat wrote:Splitting a resource for the tasks takes control of resource management away from the tasks. The whole point of building an exokernel was to see what benefits could be had by allowing for the most control of resource management by tasks as possible, no?

Control over resource management doesn't necessarily mean that the applications have to do everything by themselves. What's really important is that the kernel only exports a mechanism that can be determined by the apps and thus allows them to decide which policy should be used.

QuiTeVexat wrote:The only way to really resolve this linear-vector vs. scheduler is probably to just implement a few systems and try them out. And even then, it might not really be resolved.

Ehm, the linear-vector system is nothing else than an advanced round-robing scheduler..

QuiTeVexat wrote:Higher priority tasks would take their picks within the limits of their quotas, and then the lower priority tasks and on down would take what they wanted out of what's left. It's not like they could take resources from the higher priority tasks, and go back and forth revoking and allocating.

Do you really expect the high priority tasks to take less than they are allowed to according to their quotas ? The normal behaviour is that each task takes as much as it can get so that in the end the time-slices will be allocated according to the quota limitations. This however could also have been done directly without any user-space intervention. If a task doesn't need as much as its quota allows it should either get a smaller one, or simply block so that it wont be scheduled any longer.

QuiTeVexat wrote:I think one of my issues was that from my point of view it would be adding unnecessary functionality/bloat/overhead/complication to the kernel, in order to achieve a slight unproven performance benefit.

Performance is not what I'm after, otherwise the "huge" context switch costs of any ?-kernel would have held me of from even looking at the concept. What I'm afraid of is that the low-level mechanism in Xok isn't very practical and will therefore result in user-space schedulers with monolithic abstractions. My goal is therefore to find a more intuitive design that multiplexes better..

(continues..)

gaf · Post by **gaf** » Mon Aug 15, 2005 8:47 am

QuiTeVexat wrote:Also, it seemed slightly hackish because it seemed to be meant to workaround the design issue of not having a clear idea of who's entitled to what.

What the Xok design really lacks is a possiblity for user-level schedulers to set up rules in the kernel for their client apps. This means that they have to agree on everything these apps do, otherwise they schedulers would lose control. Your idea of using priorities for tasks (which I btw. couldn't find anywhere in the exokernel docs either..) is nothing else but an attempt to overcome this issue, but in my opinion it's still a bit too low-level.

The idea of not returning time-slices was actually meant for your design, my concept doesn't work with them at all. If a system runs three task at 50% + 20% + 30% and one of them goes idle, its cpu-share is simple excluded from the list and and the remaining values are normalized. Once it goes back active it's added to the list and everything get normalized again. If an app only wants to give it's cpu-share to a certain set of task, groups can be used - if it fancies a special task it blocks on IPC anyways.

QuiTeVexat wrote:The exokernel architects (who have much more experience in this area than both you and I) repeated the design from Aegis to Xok, leading me to believe that it wasn't as terrible as you seem to think it is.

And yet both systems are just experimental so that nobody can say if it'll also work in a "real" system that actually has a user base lager than 10 and gets some real work done.

QuiTeVexat wrote:I think it would be useful to have something to compare to, so I know if new ideas are actually an improvement. Since I have not actually gone thru the full implementation of the linear-vector, I don't even have a full set of insights from that experience to compare.

Do I really have to try everything first before I know whether I like it or not ? For me it's enough to know the theory, practical experience is a plus, but not really necessary.

QuiTeVexat wrote:It's not meaningless to applications. They have access to the tick count and know exactly what a timeslice is.

But you do agree that knowing you have x TS without any further information (exept the TS lenght, of course) won't really help anybody ? After all, depending on the overall number of time-slices on the system, it can mean everything ranging from 100% 0.0...01% of the CPU.

cpu_share = my_timeslices/all_timeslices

QuiTeVexat wrote:It's not the same. With the linear-vector, apps know *exactly* when and for how long they are scheduled to run.

The how long is not a problem as apps can also define their desired quantum lenght in a lottery scheduling system. I admit that there are some problems with real-time task which need a guaranty that they'll run at a certain point of time which my system can't offer (it might happen that a task is just "unlucky" in the lottery). I'm not yet sure how much of a problem this really is as to my knowledge neither of the two real-time algorithms I know (earliest deadline first, rate monotonic scheduling) could be realized in Xos without an user-space scheduler.

QuiTeVexat wrote:The linear-vector is *not* a prioritized round-robin, and I can't think of a way you'd implement it on top of lottery scheduling, without it ceasing to be lottery scheduling anymore.

Linear-vector is just the same as round-robin, I said priority because this is harder to accomplish. What you lose with my design is the possibility to guarantee that a task runs at a certain point of time (although I can say that it on the average runs every x ms) but this property is only of interest for real-time apps and I'm confidential that I'll find a sulution for this.

regards,
gaf

QuiTeVexat · Post by **QuiTeVexat** » Mon Aug 15, 2005 10:47 am

gaf wrote: Why shouldn't the idle-task be the power-saving task ? I agree that the devices should disable themselves, but in order to do so they need to know under which conditions this is desired by the user and as you'd not want power-managemant to be spread throughout the whole system, some central instance may make sense.

It *is* saving power on the device it works with, the processor, by halting. A central instance spreads power-management throughout the system, too. Not that having power-management be global is a bad idea anyway. It's not like it only affects a single task or user. Anyway, those other jobs have no place in the idle task. You could have a task run only when no one else wants the timeslices, but that's a separate task, not the halt-on-free-slice implemented in the kernel.

gaf wrote: Control over resource management doesn't necessarily mean that the applications have to do everything by themselves. What's really important is that the kernel only exports a mechanism that can be determined by the apps and thus allows them to decide which policy should be used.

The linear-vector gives more of that control to the applications. The only way to allow for this high level of control *is* to have the applications do it by themselves. Same with many other things in an exokernel: Apps must do filesystem work themselves, networking protocol work, and so forth because this gives them more control. Also, this *does* end up most likely being in the libOS, so unless you want to override something, the default abstractions over it would be fine (e.g.: give me long quanta, give me short quanta, give me short intervals). The libOS, as usual, would deal with the kernel.

gaf wrote: Ehm, the linear-vector system is nothing else than an advanced round-robing scheduler..

With a round-robin scheduler, you put tasks in a queue and use some algorithm to figure out what should run next. With the linear-vector, the choice is already made, and you simply execute in the timeslice you're on what it was allocated for. It's not the same thing.

gaf wrote: Do you really expect the high priority tasks to take less than they are allowed to according to their quotas ? The normal behaviour is that each task takes as much as it can get so that in the end the time-slices will be allocated according to the quota limitations.

Yes, I do. Do you expect applications in a GUI to poll waiting for input, sucking up CPU slices and making everything run slower? I don't. Can they? Yes. The normal behaviour is in fact to take what is needed; the quotas are only needed for misbehaving or badly written programs.

gaf wrote: This however could also have been done directly without any user-space intervention. If a task doesn't need as much as its quota allows it should either get a smaller one, or simply block so that it wont be scheduled any longer.

Tasks need different amounts of resources at different times, and this varies more than just doing-nothing and must-consume-everything. And varying between these extremes, other applications can be given use of the unused resources.

gaf wrote: What I'm afraid of is that the low-level mechanism in Xok isn't very practical and will therefore result in user-space schedulers with monolithic abstractions. My goal is therefore to find a more intuitive design that multiplexes better..

I do not see this. I see a design which is not broken. Here's the impasse

.

gaf wrote: What the Xok design really lacks is a possiblity for user-level schedulers to set up rules in the kernel for their client apps. This means that they have to agree on everything these apps do, otherwise they schedulers would lose control.

How? The user-level scheduler owns the timeslices. For other apps to use the timeslices, the user-level scheduler must yield() to them.

gaf wrote: Your idea of using priorities for tasks (which I btw. couldn't find anywhere in the exokernel docs either..) is nothing else but an attempt to overcome this issue, but in my opinion it's still a bit too low-level.

As you have mentioned, it doesn't have to be in the exokernel docs. It seemed to me to be a logical extension: a logical way to determine the hierarchy of revokers and revokees. Low-level is my goal.

-- continues --

QuiTeVexat · Post by **QuiTeVexat** » Mon Aug 15, 2005 10:49 am

-- continued --

gaf wrote: And yet both systems are just experimental so that nobody can say if it'll also work in a "real" system that actually has a user base lager than 10 and gets some real work done.

And our systems are experimental as well, and will probably never have a user base larger than 1! The exokernel architects still have the advantage of experience.

gaf wrote: Do I really have to try everything first before I know whether I like it or not ?

No. But to actually have hard numbers and experience and really be able to know which really works better? Yes.

gaf wrote: But you do agree that knowing you have x TS without any further information (exept the TS lenght, of course) won't really help anybody ? After all, depending on the overall number of time-slices on the system, it can mean everything ranging from 100% 0.0...01% of the CPU.

cpu_share = my_timeslices/all_timeslices

An app can calculate this for itself if it actually matters to the app. Also, the app knows more than it has so many timeslices. It knows exactly where they are and how they are arranged. There is much more information there than in a percentage.

gaf wrote: The how long is not a problem as apps can also define their desired quantum lenght in a lottery scheduling system. I admit that there are some problems with real-time task which need a guaranty that they'll run at a certain point of time which my system can't offer (it might happen that a task is just "unlucky" in the lottery). I'm not yet sure how much of a problem this really is as to my knowledge neither of the two real-time algorithms I know (earliest deadline first, rate monotonic scheduling) could be realized in Xos without an user-space scheduler.

It may not be a problem in many cases, but for the applications that want the control, it's not there. With real-time tasks, on a linear-vector, the task can find out whether or not it can meet its deadlines, and arrange it's timeslices as needed. Since you want them to be realtime, you would give them priorities/quotas that would prevent their arrangement from being messed with. There's no need for a RMS scheduler that can only take a processor load of .7.

gaf wrote: Linear-vector is just the same as round-robin, I said priority because this is harder to accomplish.

Argh! No! Round robins do not offer the control that linear-vectors give, and they are not the same. You rotate around a list of timeslices, not tasks.

gaf wrote: What you lose with my design is the possibility to guarantee that a task runs at a certain point of time

And I don't want to take that control from the apps. Impasse.

Pax tecum,
Qui te vexat.

gaf · Post by **gaf** » Mon Aug 15, 2005 1:09 pm

QuiTeVexat wrote:The only way to allow for this high level of control *is* to have the applications do it by themselves.

What counts is that applications can define exactly what they want. If the kernel's mechanism isn't strong enough, user-level managers become necessary and you'll end up with an ?-kernel in which the apps can't really change their own policy. In an exokernel the system therefore has to allow user-level managers to set up restrictions for their clients in advance, so that the kernel to decide based upon these quotas. This does not mean that the user-level managers can't control their apps any longer..

QuiTeVexat wrote:With a round-robin scheduler, you put tasks in a queue and use some algorithm to figure out what should run next. With the linear-vector, the choice is already made, and you simply execute in the timeslice you're on what it was allocated for. It's not the same thing.

This definition is really new to me, maybe you should also edit the entry at wikipedia as it's obviously wrong:
"Since the operating system must have a reference to the start of the list, and a reference to the current application, it can easily decide who to run next - just follow the array or chain of PDBs to the next element, and if you reach the end of the array or list, reset back to the start."

There are two differences that made me say "advanced":
- tasks can have several time-slices
- the build-in idle thread

QuiTeVexat wrote:Tasks need different amounts of resources at different times, and this varies more than just doing-nothing and must-consume-everything. And varying between these extremes, other applications can be given use of the unused resources.

If the app doesn't need any more CPU time, all it has to do is to block. The other apps will then be scheduled during the remaining time-slices of its quantum. As an option it could also donate the rest to a specific app using yield(), although I doubt that this will happen very often.

QuiTeVexat wrote:How? The user-level scheduler owns the timeslices. For other apps to use the timeslices, the user-level scheduler must yield() to them.

Ehm, wait a second.. until quite recently I thought that the apps would schedule themselves. If your using a design in which the scheduer owns all time-slices and the apps are merely called (that's what yield(x) is for: donate the rest of the time-slice to x), all my worries about monolithic external schedulers have come true..

QuiTeVexat wrote:And our systems are experimental as well, and will probably never have a user base larger than 1! The exokernel architects still have the advantage of experience.

I herewith bow from the almighty exokernel creators - amen

Don't get me wrong, but they're just students/profs and might also make some mistakes every now and then..

QuiTeVexat wrote:An app can calculate this for itself if it actually matters to the app.

That's not the point: Any app would have to calculate this value as it's the only way to say something about the performance that can be expected. The question is if knowing exactly (!) when a time-slice will execute is really worth all the trouble when this does't matter for 99% of the apps.

QuiTeVexat wrote:Also, the app knows more than it has so many timeslices. It knows exactly where they are and how they are arranged. There is much more information there than in a percentage.

My system will allow both, percentage and desired quantum size in time-slices, to be defined. The equation (share = freq*quantum_lenght) is therefore determined, although I can't guaranty for deadlines.

QuiTeVexat wrote:There's no need for a RMS scheduler that can only take a processor load of .7.

I tend to agree, although this is of course the user's choice. What about the other one, EDF ? While the Xok system would in theory allow the scheduling of real-time tasks, it's also very static as it doesn't allow any events to be taken into account, and its resolution is pretty coarse grained since tasks can only meet deadlines that aren't smaller than the time-slice length. For a real system it would in my opinion probably be necessary to use a different approach anyways..

regards,
gaf

QuiTeVexat · Post by **QuiTeVexat** » Mon Aug 15, 2005 4:59 pm

gaf wrote: What counts is that applications can define exactly what they want.

With a linear-vector, apps can define *exactly* want they want, more so than with the lottery scheduler.

gaf wrote:In an exokernel the system therefore has to allow user-level managers to set up restrictions for their clients in advance, so that the kernel to decide based upon these quotas.

This is inverting the hierarchy, making the exokernel dependent on the user-level managers.

gaf wrote: This definition is really new to me, maybe you should also edit the entry at wikipedia as it's obviously wrong:

The article is not wrong. Like I said (and what's in the article), a round-robin queues up tasks and rotates thru them. The linear-vector of timeslices queues up timeslices and goes thru them. In a round-robin scheduler, there is no defined point at which tasks will run or for how long. The defined part is order of tasks. The linear-vector of timeslices defines when and how long, because it queues timeslices, not tasks.

gaf wrote: - the build-in idle thread

This is nothing new. From what I understand, normal schedulers halt until there's something to do as well.

gaf wrote: If the app doesn't need any more CPU time, all it has to do is to block.

That's an all-or-none. If the app doesn't need *any* more CPU time though, it can free all its timeslices, and other apps can take them.

gaf wrote:Ehm, wait a second.. until quite recently I thought that the apps would schedule themselves. If your using a design in which the scheduer owns all time-slices and the apps are merely called (that's what yield(x) is for: donate the rest of the time-slice to x), all my worries about monolithic external schedulers have come true..

The apps *would* schedule themselves. I was simply clarifying that a user-level scheduler could be implemented on top of the Xok design and maintain control over its timeslices. You seemed to be concerned that this wouldn't be possible. If you were concerned and thought it was bad that apps could bypass the scheduler by scheduling timeslices for themselves, I would say that your worries of monolithic external schedulers *have* come true.

-- continues --

QuiTeVexat · Post by **QuiTeVexat** » Mon Aug 15, 2005 5:04 pm

-- continued --

gaf wrote: Don't get me wrong, but they're just students/profs and might also make some mistakes every now and then..

And hobbyists make more of them. I'm just going with don't fix what ain't broken.

gaf wrote: That's not the point: Any app would have to calculate this value as it's the only way to say something about the performance that can be expected. The question is if knowing exactly (!) when a time-slice will execute is really worth all the trouble when this does't matter for 99% of the apps.

Those 99% of apps would be using a libOS that wouldn't make things any more complicated than it needs to be. Of course, if we're not worried about those 1% of apps, and the usual policies/abstractions work in general... why are we building exokernels?

gaf wrote:My system will allow both, percentage and desired quantum size in time-slices, to be defined. The equation (share = freq*quantum_lenght) is therefore determined, although I can't guaranty for deadlines.

Like I said, a libOS could allow for a more abstracted interface (defining desired quanta sizes and percentages), and deal with the specifics itself. The difference is, Xok's system can provide for deadlines.

gaf wrote:What about the other one, EDF ?

I'm unsure, but if I understand correctly, EDF doesn't do hard realtime.

gaf wrote:While the Xok system would in theory allow the scheduling of real-time tasks, it's also very static as it doesn't allow any events to be taken into account

You have to deal with this issue in any system that deals with realtime and deadlines. I think for realtime tasks you'd just have to defer the handling of interrupts that happened during their execution until a point at which they wouldn't interfere with anyone's deadlines.

gaf wrote:and its resolution is pretty coarse grained since tasks can only meet deadlines that aren't smaller than the time-slice length.

The timeslice length is equal to the time between timer ticks, and you aren't gonna get better resolution than the timer'll give you anyway.

Pax tecum,
Qui te vexat.

gaf · Post by **gaf** » Tue Aug 16, 2005 8:06 am

QuiTeVexat wrote:With a linear-vector, apps can define *exactly* want they want, more so than with the lottery scheduler.

We already had this this and I never denied it. What you get is the ability to guarantee that an app runs at a certain instance but you have to pay for it with a system that's hard to handle for the external schedulers.

QuiTeVexat wrote:This is inverting the hierarchy, making the exokernel dependent on the user-level managers.

The exokernel multiplexes the system resources and offeres an interface for allocation/delocation to the apps. User-level managers can set-up quotas that define how much of a resource their children task get, the kernel checks these quotas before granting any resources.
A user-level memory manager would, for example, tell the kernel that childA will be allowed to get 10MBs and childB 5MBs. If one of the two apps requests more memory, the kernel first checks the task's quota which says whether they're allowed to allocate the memory or not. This check is also necessary in your system, it might just takes place in the user-space managers (which may lead issues with to stacked managers, btw).

How does this reverse the operating system's hierarchy ?

QuiTeVexat wrote:The article is not wrong. Like I said (and what's in the article), a round-robin queues up tasks and rotates thru them. The linear-vector of timeslices queues up timeslices and goes thru them. In a round-robin scheduler, there is no defined point at which tasks will run or for how long.

It actually was irony when I proposed to change the article. In your last post you phrased it way different ("With a round-robin scheduler, you put tasks in a queue and use some algorithm to figure out what should run next") and I think that sentence is at least ambiguous. To me it sounds as if the scheduler would do some complex calculations to find out which one to schedule next, while all it really does is take the next one in the list and schedule it unless it's blocked. This is also what I meant with "built-in" idle task as your system also schedules empty time-slices. The two systems might operate on a different level (tasks, time-slices), but apart from they're absolutly equal..

QuiTeVexat wrote:This is nothing new. From what I understand, normal schedulers halt until there's something to do as well.

A normal system would just skip time-slices that aren't in use rather than halting on them.

QuiTeVexat wrote:That's an all-or-none. If the app doesn't need *any* more CPU time though, it can free all its timeslices, and other apps can take them.

Why should this be all-or-non ? The task does its work until it has finished and then blocks as there's no point in waiting until the CPU is eventually preemted. If your concern is that the task has to give away all of the remainer of its current time-slice, please remember that tasks that needs to be run frequently already have very small quanta and that there's really no point in running the same app twice in a row with one or two time-slices in between. Once the task gets new work (from another task via IPC) it'll go active again and be scheduled next time. There's no use in buisy waiting until some message arrives either..

QuiTeVexat wrote:I was simply clarifying that a user-level scheduler could be implemented on top of the Xok design and maintain control over its timeslices. You seemed to be concerned that this wouldn't be possible.

Not really.. I'm concerned that it is not only possible but actually necessary to use user-space schedulers as the mechanism offered by the Xok kernel is too low-level to be use by the apps. I'm not worried about the app programmers either, but think that the Xok system is just to rigide to be used by a greater number of apps (not schedulers) in a reasonable way.

(post will be continued..)

gaf · Post by **gaf** » Tue Aug 16, 2005 8:15 am

QuiTeVexat wrote:If you were concerned and thought it was bad that apps could bypass the scheduler by scheduling timeslices for themselves, I would say that your worries of monolithic external schedulers *have* come true.

I actually want to attempt building a system in which the user-space managers play a minor role as resource managers but leave (almost) all of the policy to the apps. In such a system the external scheduler would not have to be asked for permission by the apps all the time as it has already installed a quota in the kernel, but this also requires a kernel mechanism that is powerfull enough and I'm afraid that the one used in Xok isn't..

QuiTeVexat wrote:And hobbyists make more of them. I'm just going with don't fix what ain't broken.

And I improve what ain't perfect

QuiTeVexat wrote:The difference is, Xok's system can provide for deadlines.

Only for a static system, adding a new real-time task will mean quite a lot of revocations and is far from trivial.

QuiTeVexat wrote:I'm unsure, but if I understand correctly, EDF doesn't do hard realtime.

I must admit that I'm not an expert on real-time systems, but nevertheless I'm pretty sure that EDF does allow hard deadlines. Probably you have also read the wikipedia article in which it says "Because deadlines are dynamic, when the system is overloaded the process that misses its deadline will be unpredictable", but this only applies if the system would have to run at more than 100%. Thus you can easily avoid the problem by just making sure that there are never more task than the system can process (which is a pretty good idea in general..).

QuiTeVexat wrote:The timeslice length is equal to the time between timer ticks, and you aren't gonna get better resolution than the timer'll give you anyway.

The periodic timer normally runs at a rather low frequency to reduce context switch costs so that a second clock with a higher resolution might by necessary (RTC, APIC). I'm not saying that this isn't feasible in Xok, but I would have to be done in user-space so that the advantage of knowing when a time-slice is executed vanishes.

regards,
gaf

Colonel Kernel · Post by **Colonel Kernel** » Tue Aug 16, 2005 9:55 am

<watches the tennis ball bounce back and forth>

A "linear vector of timeslices" makes a small amount of intuitive sense, but not much more than that.

Can either of you provide an example of how a libOS might implement priority round-robin (for example) on top of Xok's mechanism...?

And how does Xok's mechanism relate to scheduler activations?

This mere mortal is confused.

OSDev.org

Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management

Re:Minimalist Microkernel Memory Management