OSDev.org

Posted: **Sun Feb 27, 2022 5:47 pm**

So I have a pretty straight forward CPU scheduler in place (at least from the high-level view point

): grab thread at the front of the queue, restore registers, run, wait for PIT (not APIC) interrupt, save registers and place at the back of the queue (unless the thread preempts early).
Now I want to improve it and make it more advanced, but I have a lot of questions mostly due to most being OS implementation specific:
I am mostly edging to how Linux does things but I am open to better algorithms.

1 - Pretty much everywhere you read, the amount of time something runs in a CPU is determined on the process level, however what about the threads? Does the process get an assigned time-slice which is then shared across all of its threads? Or are all threads treated as separate and assigned different time slices.

2 - In my scenario, I am using the PIT to preempt a thread and keep a rough track of time; however would it be a better approach (assuming thread time-slices are calculated separately from the process) to configure the PIT to give the thread x amount of processing time OR would it be better to keep a standard PIT frequency (e.g. 1ms) and when the PIT timer preempts, do some tiny processing and check if the time slice has ran out of time and if not, go straight back to the process.

I still have a lot to learn and would love if someone could answer these questions since this is the best forum for them. Thank you!

Posted: **Sun Mar 06, 2022 10:05 am**

From my point of view, your questions can be answered doing benchmarks of different implementations if your OS is complete enough ; and if it is not, there is no need for such a precise scheduler, a naive one will do.

2 - In my scenario, I am using the PIT to preempt a thread and keep a rough track of time; however would it be a better approach (assuming thread time-slices are calculated separately from the process) to configure the PIT to give the thread x amount of processing time OR would it be better to keep a standard PIT frequency (e.g. 1ms) and when the PIT timer preempts, do some tiny processing and check if the time slice has ran out of time and if not, go straight back to the process.

I don't think it is a good idea to prempt to check every 'short ammount' of time, it will waste CPU time.
What i would do is have a constant time between each schedule call, but if a process as a higher priority it gets scheduled more often.

Posted: **Mon Mar 07, 2022 1:37 am**

I think time-slices should be per thread, not per process. I use a fixed time slice of 1 ms for every thread, but I also have priorities. Priorities are only used by kernel threads that serves time-critical tasks like audio or network chips. These will typically never use a complete 1 ms time slice, rather will give up control as soon as they have completed their task.

Posted: **Mon Mar 07, 2022 1:20 pm**

rdos wrote:I think time-slices should be per thread, not per process. I use a fixed time slice of 1 ms for every thread, but I also have priorities. Priorities are only used by kernel threads that serves time-critical tasks like audio or network chips. These will typically never use a complete 1 ms time slice, rather will give up control as soon as they have completed their task.

Agree and disagree.

The thread should be the entity that the kernel schedules. Else, you couldn't have kernel only threads (unless you also have a kernel only process to attach them to.)

But, accounting should arguably be done by process, or perhaps even by user, and thread time slices adjusted accordingly.

Else, a process could effectively hijack the majority of a CPU usage just by creating a large number of runnable threads. The user creating those threads would thus starve out other processes with only a small number of threads.

Probably not a problem on a personal machine like a PC or phone/tablet, but potentially a problem on a shared machine.

Posted: **Tue Mar 08, 2022 1:12 am**

thewrongchristian wrote:
rdos wrote:I think time-slices should be per thread, not per process. I use a fixed time slice of 1 ms for every thread, but I also have priorities. Priorities are only used by kernel threads that serves time-critical tasks like audio or network chips. These will typically never use a complete 1 ms time slice, rather will give up control as soon as they have completed their task.
Agree and disagree.

The thread should be the entity that the kernel schedules. Else, you couldn't have kernel only threads (unless you also have a kernel only process to attach them to.)

But, accounting should arguably be done by process, or perhaps even by user, and thread time slices adjusted accordingly.

Else, a process could effectively hijack the majority of a CPU usage just by creating a large number of runnable threads. The user creating those threads would thus starve out other processes with only a small number of threads.

Probably not a problem on a personal machine like a PC or phone/tablet, but potentially a problem on a shared machine.

Agreed. It depends on your target user-case. For multiuser systems, it makes sense to limit CPU time per process or perhaps per user, but not for personal machines or embedded hardware. I don't even have a user model in my OS as I don't target the multiuser environment. The programs running on my system are typically controlled by configuration, and so there is no need to limit their CPU usage. It's possible to create user threads with above normal priority, but in practice, I never use this. It might be useful for audio-feeders or applications doing video playback, but other than that, there is no reason for it.

Posted: **Fri Mar 11, 2022 11:48 am**

rdos wrote: use a fixed time slice of 1 ms for every thread

So if context switching takes half a millisecond, you're spending more then 25% of your time context switching. Unless you're targeting some non-normal use-case, that definitely isn't enough time!

Posted: **Fri Mar 11, 2022 1:15 pm**

nexos wrote:
rdos wrote: use a fixed time slice of 1 ms for every thread
So if context switching takes half a millisecond, you're spending more then 25% of your time context switching. Unless you're targeting some non-normal use-case, that definitely isn't enough time!

If your context switch takes half a millisecond, you have bigger problems!

A context switch should be on the order of 10's of microseconds:

https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

My rule of thumb is that it'll cost you about 30µs of CPU overhead

That includes TLB and cache effects.

Even this paper from the mid-90's, on ye olde SPARC (running at 20MHz), with register windows to flush, seems to put the context switch cost at around 70us.

https://www.researchgate.net/publication/2281328_MythOS_--_A_Micro-Kernel_Threads_Operating_System

Posted: **Fri Mar 11, 2022 2:25 pm**

thewrongchristian wrote:
nexos wrote:
rdos wrote: use a fixed time slice of 1 ms for every thread
So if context switching takes half a millisecond, you're spending more then 25% of your time context switching. Unless you're targeting some non-normal use-case, that definitely isn't enough time!
If your context switch takes half a millisecond, you have bigger problems!

A context switch should be on the order of 10's of microseconds:

https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

My rule of thumb is that it'll cost you about 30µs of CPU overhead
That includes TLB and cache effects.

Even this paper from the mid-90's, on ye olde SPARC (running at 20MHz), with register windows to flush, seems to put the context switch cost at around 70us.

https://www.researchgate.net/publication/2281328_MythOS_--_A_Micro-Kernel_Threads_Operating_System

Right. I decided to use the one ms time slice back when I run my OS on 20 MHz 386 SX processors, and even on that hardware, it would consume considerably less than 10% of the CPU time. On modern hardware, the overhead is so small that it can be ignored.

Besides, if context switching takes 0.5 ms, then IO intensive applications will run slow as hell as they will do lots of context switches, a lot more than programs that are mostly CPU intensive, perhaps with heavy calculations or graphics.

Posted: **Fri Mar 11, 2022 10:02 pm**

rdos wrote:
Agreed. It depends on your target user-case. For multiuser systems, it makes sense to limit CPU time per process or perhaps per user, but not for personal machines or embedded hardware. I don't even have a user model in my OS as I don't target the multiuser environment. The programs running on my system are typically controlled by configuration, and so there is no need to limit their CPU usage. It's possible to create user threads with above normal priority, but in practice, I never use this. It might be useful for audio-feeders or applications doing video playback, but other than that, there is no reason for it.

IIRC your OS is mostly running in embedded environments, so it wouldn't so much apply to you, but, in general, even if a system is used by only one user, it should at least have separate admin and non-admin accounts if it is network accessible, to prevent a remote vulnerability in an application from automatically becoming a compromise of the whole machine.

I'd further argue that, multiuser or not, networked systems should generally have subusers: for example, a browser visiting google.com on behalf of user fred would operate as the user fred:google.com, and would only have the subset of fred's permissions that he granted to programs connecting to google.com.

Posted: **Sat Mar 12, 2022 2:48 pm**

linguofreak wrote:
rdos wrote:
Agreed. It depends on your target user-case. For multiuser systems, it makes sense to limit CPU time per process or perhaps per user, but not for personal machines or embedded hardware. I don't even have a user model in my OS as I don't target the multiuser environment. The programs running on my system are typically controlled by configuration, and so there is no need to limit their CPU usage. It's possible to create user threads with above normal priority, but in practice, I never use this. It might be useful for audio-feeders or applications doing video playback, but other than that, there is no reason for it.
IIRC your OS is mostly running in embedded environments, so it wouldn't so much apply to you, but, in general, even if a system is used by only one user, it should at least have separate admin and non-admin accounts if it is network accessible, to prevent a remote vulnerability in an application from automatically becoming a compromise of the whole machine.

I'd further argue that, multiuser or not, networked systems should generally have subusers: for example, a browser visiting google.com on behalf of user fred would operate as the user fred:google.com, and would only have the subset of fred's permissions that he granted to programs connecting to google.com.

That's assuming that network requests are able to run scripts or similar that leads to the possibility to manipulate the file system or the OS configuration itself. I generally implement socket servers as C++ code, and because of that, I know exactly what they can do and which part of the FS they can affect. There is no way to run scripts, start the command line or similar. Which, in fact, leads to much better security than an account-based system where random scripts can be downloaded and run.

I can still support the user-account functionality (logging in to a service) by only allowing certain users to access certain services, but this is not managed by the OS user account model, rather is implemented in the socket server itself. For instance, the FTP server class supports adding user/passwords combinations, setting access rights (read only or read/write) and must be handled a root directory.

Posted: **Tue Mar 15, 2022 2:36 pm**

Thank you so much for your replies everyone! Now my real question is setting the base PIT frequency. One which allows large flexibility with the amount of time given and one which doesn't sacrifice the majority of the CPU by constantly preempting; however I can work that sweet spot myself. The issue I have right now with short preempts is that occasionally in VMs the OS freezes up which I assume is due to a preemption happening in an area i don't expect it to happen leading to bugs.

Though I do wonder how you would reasonably accurately keep track of time? I see tons of languages and frameworks where time can be kept track of extremely accurately (in some cases in microseconds! though I do not need that level of precision) - how is this achieved? Does the software of the program also keep track of time internally? But how would that be reflected accurately between preempts?

Posted: **Tue Mar 15, 2022 2:54 pm**

YDeeps1 wrote:The issue I have right now with short preempts is that occasionally in VMs the OS freezes up which I assume is due to a preemption happening in an area i don't expect it to happen leading to bugs.

If you used a debugger, you wouldn't have to assume.

YDeeps1 wrote:Though I do wonder how you would reasonably accurately keep track of time?

Typically by way of hardware timers. Hardware is better at counting than software, so all the OS has to do is find out how fast a given counter is ticking and associate one particular count with one particular time. Then monotonic time is easily established from there, and real time can just be a (settable) constant offset from monotonic time.

These days, most of the time you will be able to just use the TSC for the task, so you don't even need a system call to read the counter.

Posted: **Tue Mar 22, 2022 8:26 am**

nullplan wrote:
YDeeps1 wrote:The issue I have right now with short preempts is that occasionally in VMs the OS freezes up which I assume is due to a preemption happening in an area i don't expect it to happen leading to bugs.
If you used a debugger, you wouldn't have to assume.

YDeeps1 wrote:Though I do wonder how you would reasonably accurately keep track of time?
Typically by way of hardware timers. Hardware is better at counting than software, so all the OS has to do is find out how fast a given counter is ticking and associate one particular count with one particular time. Then monotonic time is easily established from there, and real time can just be a (settable) constant offset from monotonic time.

These days, most of the time you will be able to just use the TSC for the task, so you don't even need a system call to read the counter.

The TSC sounds like an excellent way of doing this! I would assume to calculate the current time I would need to determine the real time of x clock cycles and then simply use division and multiplication from there?

Posted: **Tue Mar 22, 2022 2:29 pm**

YDeeps1 wrote:The TSC sounds like an excellent way of doing this! I would assume to calculate the current time I would need to determine the real time of x clock cycles and then simply use division and multiplication from there?

Pretty much, yeah. You determine TSC frequency somehow (and if you measure by way of the PIT), then you read the TSC and say that your monotonic clock starts at that timestamp. Then in future if someone wants to know the monotonic time, you read the TSC, subtract the initial read from the result, divide the result by the TSC frequency (which is a bit tricky to get right without FPU), and then you have your monotonic clock. If someone wants to get the real-time clock, you do the same thing, but add an offset to the returned time. If someone wants to set the real-time clock, you read the monotonic clock and save the offset as the difference between current monotonic and desired real-time clock.

Then you can set the real-time clock by whatever the RTC says. Or you can do that in the init task.

Whether you can do any of that depends on two things, though:

The TSC is consistent across all cores.
The TSC frequency is invariant under low power states.

ACPI can tell you about number two, but I would have to look up where exactly. Number one I don't know how to check.

However, even if the TSC does not pan out, there are still many other hardware counters available in the system (HPET, PIT, ACPI PM timer, ...), and you can always fall back to a software counter as last resort. The principal system as lined out above does not change if another counter is used

OSDev.org

CPU scheduling implementation question

CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question

Re: CPU scheduling implementation question