Partial time slices

AndrewAPrice · Post by **AndrewAPrice** » Tue May 12, 2020 1:54 pm

Let's say you have two threads, and your timer fires every 10ms.

One thread runs for 8ms and yields itself. The other thread runs for 2ms, and the timer then preempts it, and switches back to the other thread.

The thread that is cooperatively yeilding, could actually be getting 80% of the processor time. I feel that it's unfair to penalize the busy thread that doesn't yield with a partial time slice.

How do others deal with partial time slices?

Do you ignore the problem, and don't care if you penalize threads with partial time slices? Do you tell your scheduler to skip preemping on the next run (so yielding is a thread voluntarily giving the rest of it's timeslice to another thread, giving the next thread an extra long time slice?) Run your timer at a higher rate that your scheduler, and make sure each thread gets a certain number of ticks before preempting?

AndrewAPrice · Post by **AndrewAPrice** » Tue May 12, 2020 3:44 pm

I was wondering if instead of a fixed frequency, it would be better for my timer to run in one shot mode? Then every time I context switch, I can reset the timer to the maximum time slice?

MollenOS · Post by **MollenOS** » Tue May 12, 2020 10:19 pm

I can only say for myself what i do, which is to run my timer in one-shot mode and then reset it to the given time-slice at every context switch. I use a multilevel feedback queue, which simply means i reward threads that give up their time-slice by keeping their priority and resetting their time-slice to full, and i penalize threads using the entire time-slice by reducing their priority, but however giving them a longer time-slice next time they are up for running.

xenos · Post by **xenos** » Wed May 13, 2020 1:15 am

I agree with MollenOS (even though I have not yet implemented this in my scheduler, but I plan to do so). I think that a one-shot timer makes much more sense than a periodic timer to set the maximum time a thread may spend running at once, since most of the time threads should yield / block for other reasons than expiring a time slice (unless there is actual competition for CPU time, because there are several compute intensive threads). Threads could be waiting for I/O, user input or other events to happen, and would thus issue a blocking wait. Possibly all threads would do that, and so in the end the CPU would be idle (unless there is 100% CPU usage). And for the "idle state", there is not much reason to be woken up by a periodic or one-shot timer at all, so that mechanism could even be applied only to non-idle threads.

iansjack · Post by **iansjack** » Wed May 13, 2020 4:19 am

If you use a one-shot timer, how do you keep track of elapsed time in your kernel?

There's a very good discussion of various scheduling approaches here: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf and here: http://pages.cs.wisc.edu/~remzi/OSTEP/c ... d-mlfq.pdf

Korona · Post by **Korona** » Wed May 13, 2020 9:16 am

Timekeeping and timers are orthogonal to scheduling.

Managarm uses a tickless design, i.e., the length of a time slice is computed dynamically whenever a thread is scheduled. The calculation to do this is similar (but not equivalent) to Linux' CFS. Basically, the idea is that each thread should run for a fraction of exactly 1/(# threads) of the time.

The local APIC timer is used to enforce the time slice but it is in no way special compared to other IRQs - each IRQ or syscall can trigger rescheduling if higher priority threads wake up. Priority does not affect the length of a time slice, it determines whether a thread is scheduled or not. Managarm always schedules the thread with highest priority. In the common situation that many processes have the same priority, Managarm schedules the thread with highest "unfairness", i.e., the thread whose fraction of running time deviates most extremely from 1/(# threads).

To keep track of real time (and monotonic time), Managarm uses the TSC (with HPET as fallback if no invariant TSC exists). The HPET is used for global timers (even when TSC is used for timekeeping). The TSC has the advantage that it can be read from userspace. To let userspace compute the offset to real-time (i.e., UTC), a read-only page is mapped into each process and updated by a "clocktracker" server.

linguofreak · Post by **linguofreak** » Fri May 15, 2020 5:11 pm

Korona wrote:The TSC has the advantage that it can be read from userspace. To let userspace compute the offset to real-time (i.e., UTC), a read-only page is mapped into each process and updated by a "clocktracker" server.

It's probably best not to enable the TSD flag or provide a wall time offset for all processes, as even relative timing information (let alone access to actual wall time) can have security implications.

nullplan · Post by **nullplan** » Sat May 16, 2020 1:39 am

linguofreak wrote:It's probably best not to enable the TSD flag or provide a wall time offset for all processes, as even relative timing information (let alone access to actual wall time) can have security implications.

Did you mean to say, that the TSD flag should be set (to disallow userspace from reading the TSC)? Besides, yes, side channels are a thing. Denying service to your law-abiding programs to prevent side channels is, however, stupid. If need be, anyone can fashion a timer just from having a second thread increase a global variable. Sure it won't be as accurate, but for a side-channel attack, it might just be enough. Moreover, many programs want to know relative timing information for many reasons, not the least of whic being profiling, and preventing them from doing that because one day, and insecure application on your system may be exploited is as sensible as removing all knifes from your house because someone might hurt others with them.

Besides timing side channels, I see no security implications for timing information.

Korona · Post by **Korona** » Sat May 16, 2020 1:50 am

IMHO, disabling TSC to prevent side channel attacks is a fool's errand, at least if you do not severely restrict your system also in other ways. A similarly accurate clock can be constructed by concurrently incrementing a 64-bit counter from another userspace thread. It was shown over and over again that timing attacks are also possible (even though a bit harder) with less accurate clocks by averaging and exploiting the law of large numbers.

eekee · Post by **eekee** » Tue May 19, 2020 8:35 am

Korona wrote:IMHO, disabling TSC to prevent side channel attacks is a fool's errand, at least if you do not severely restrict your system also in other ways. A similarly accurate clock can be constructed by concurrently incrementing a 64-bit counter from another userspace thread. It was shown over and over again that timing attacks are also possible (even though a bit harder) with less accurate clocks by averaging and exploiting the law of large numbers.

Would it help to introduce a bit of randomness into the scheduler timing, or does it just result in a "less accurate clock"?

I'm also almost sure there was a Linux kernel option to randomize the TSC, last time I configured it, (2.6.??) but now I'm not sure that makes any sense, let alone whether it would help.

nullplan · Post by **nullplan** » Tue May 19, 2020 10:25 pm

eekee wrote:Would it help to introduce a bit of randomness into the scheduler timing, or does it just result in a "less accurate clock"?

Help with what? Also, yes, a more random clock is just a little less accurate. And modern CPUs are so fast that a thread incrementing a global variable can get many, many cycles in before being preempted. Again, it's not perfect, but it is good enough. Overall, the timer thread is always runnable and will therefore consume most of one CPU core. If not much else is going on, there may not even be a need to ever preempt this thread.

eekee wrote:I'm also almost sure there was a Linux kernel option to randomize the TSC, last time I configured it, (2.6.??) but now I'm not sure that makes any sense, let alone whether it would help.

Well, it no longer exists as far as I could find. To my knowledge there is no way to change the TSC. Even if there was, there would be no point, since most of the code using the TSC is only interested in its differences over time. And if the Linux team somehow found a way to make the TSC completely useless, there are other methods of timing things.

eekee · Post by **eekee** » Wed May 20, 2020 5:27 am

That all makes sense, thanks.

codyd51 · Post by **codyd51** » Tue Jul 07, 2020 3:45 am

I implemented things such that when a process is scheduled, it tracks both when it was scheduled and how long its timeslice is, based on priority (MLFQ). The timer handler checks whether the current task has exceeded this lifetime, and if not, doesn't schedule.

OSDev.org

Partial time slices

Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices

Re: Partial time slices