Scheduling extremely inefficient

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: Scheduling extremely inefficient

Post by mariuszp »

Geri wrote:is there a 32-bit version available?
No, sorry
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Scheduling extremely inefficient

Post by Brendan »

Hi,
mariuszp wrote:
Geri wrote:you should have linked an ISO, so we could debug it.
https://github.com/madd-games/glidix/bl ... glidix.iso

Also, indeed, my scheduler does not support priorities, and the PIT is fixed at a frequency of 1000 Hz, even though some processes call yield() to give up their CPU time... but then the next process would get much less CPU time than normal... that does sound like a pretty bad design. And drivers like the IDE driver and the keyboard driver are just hanging there waiting for commands and still getting CPU time.
So, if there's 100 tasks that happen to be in the order "video driver, GUI, application, keyboard driver, 96 other things", then:
  • The user presses a key immediately after the keyboard driver finishes
  • We wait for 99 task switches and up to 99 ms before the keyboard checks for a keypress
  • The keyboard gets the keypress and sends it to the GUI
  • We wait for 97 task switches and up to 97 ms before the GUI gets the keypress
  • GUI receives the keypress, decides what to do with it and sends it to the application
  • We wait for 1 task switch before the application gets the keypress
  • The application receives the keypress and updates its window but runs out of time before it can tell the GUI update the video
  • We wait for 100 task switches and up to 99 ms before the application can tell the GUI to update the video
  • We wait for 99 task switches and up to 99 ms before the GUI knows that video needs to be updated
  • The GUI updates its video and tells the video driver to update the screen
  • We wait for 99 task switches and up to 99 ms before the video driver knows the screen needs to be updated
  • The video driver updates the screen
In this scenario, it took 495 task switches and up to 295 ms between user pressing a key and user getting visual feedback.

The alternative (with correctly set task priorities and pre-emption) is:
  • The user presses a key, kernel does a task switch to keyboard driver immediately
  • The keyboard gets the keypress and sends it to the GUI, then blocks (waiting for next IRQ); causing an immediate task switch to GUI
  • GUI receives the keypress, decides what to do with it, sends it to the application and blocks (waiting for next event); causing an immediate task switch to application
  • The application receives the keypress and updates its window. It doesn't run out of time because all higher priority tasks are blocked anyway. It tells the GUI to update the video which unblocks the (higher priority) GUI and causes an immediate task switch/pre-emption
  • The GUI updates its video, tells the video driver to update the screen and blocks (waiting for the next event), causing an immediate task switch
  • The video driver updates the screen
In this case it's 5 task switches instead of 495 task switches and none of the other 96 tasks (that weren't involved) wasted any of the CPU time; so the time between user pressing a key and user getting visual feedback is more like 2 ms instead of up to 295 ms.

Note that (for your scheduler) reducing the time between task switches (e.g. from 1 ms to 0.1 ms) increases the chance that a task won't be able to finish something important before it runs out of time and will have to wait for all other tasks to have their turn, and also increases the time spent doing the task switches themselves; and for both of these reasons it can severely reduce performance. Also; increasing the amount of time tasks are given (e.g. from 1 ms to 10 ms) increases the amount of time an important task has to wait while unimportant tasks are wasting CPU time, which can severely reduce performance.

Basically, it's very bad, and changing the amount of time tasks are given won't change that.
mariuszp wrote:I'm gonna be adding APIC support and here's a question: if I set APIC to single-shot mode, and that "shot" happens when interrupts are disabled, is it missed or does it get sent as soon as I enable interrupts? What about if interrupts are disabled, the shot happens, then I program the APIC again, and then enable interrupts? (I'll need to do that for calls such as wait() which would need to trigger the scheduler to schedule another task).
"Disabling" interrupts (e.g. with the CLI instruction) only causes interrupts to be postponed (until the CPU is able to receive them again) and doesn't cause IRQs to be discarded.

If the local APIC timer sends an IRQ while IRQs are disabled in the CPU, then that IRQ will be postponed. If the CPU resets the local APIC timer count and then enables IRQs, then the CPU receives the postponed IRQ from the local APIC timer when IRQs are enabled (after it has reset the local APIC timer count).

This may cause race conditions, so your code has to either avoid the race conditions or deal with them if/when they occur. I don't think there's a way to effectively avoid the race conditions. To handle them if/when they occur; the local APIC timer IRQ handler can check the timer count and ignore the IRQ if the count is non-zero.
mariuszp wrote:(EDIT: I may also add that calling memcpy() while interrupts are disabled also causes the swap to screen to take a few seconds, is that a Bochs problem or could the memcpy() implementation I 'stole' from PDCLib be really bad and I should use "rep stosb" instead?)
Why are IRQs disabled in the first place? How much data are you copying? How does PDCLib implement 'memcpy()'?

I'd be tempted to assume that you're talking about blitting pixel data from some buffer in RAM to display memory; and IRQs shouldn't be disabled, you do nothing to prevent data that hasn't been changed since last time from being "re-copied" to display memory again for no sane reason, and that you shouldn't be using generic code (e.g. `memcpy()`) for a very special case.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Scheduling extremely inefficient

Post by Geri »

mariuszp wrote:
Geri wrote:is there a 32-bit version available?
No, sorry
my emulator supports 32-bit only, so i cant debug your system.
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: Scheduling extremely inefficient

Post by mariuszp »

Brendan wrote:Why are IRQs disabled in the first place?
I only disabled them to test performance this one time. The PDCLib implementation is just the good old "*dst++ = *src++".

As for the race conditinos you mentioned, how about this: I first lock the APIC spinlock, then disable interrupts, then program the APIC, then enable interrupts, and finally release the spinlock. This means that if an APIC IRQ occured during the interrupts-disabled period, it will be sent right after STI is executed. The APIC IRQ handler will then try locking the spinlock, and if that fails, it will just return, and then the spinlock will be released by the APIC programming code.

Sounds good?

Oh, and I was just copying 800*600*4 bytes of data into video memory (just white pixels). No matter if I use memcpy() or memset(), even with interrupts disabled, it takes a few seconds to copy this single frame.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Scheduling extremely inefficient

Post by Brendan »

Hi,
mariuszp wrote:As for the race conditinos you mentioned, how about this: I first lock the APIC spinlock, then disable interrupts, then program the APIC, then enable interrupts, and finally release the spinlock. This means that if an APIC IRQ occured during the interrupts-disabled period, it will be sent right after STI is executed. The APIC IRQ handler will then try locking the spinlock, and if that fails, it will just return, and then the spinlock will be released by the APIC programming code.
That might work (depending on a lot of things, like what other code uses that spinlock and whether or not anything needs to set the local APIC count without enabling IRQs).
mariuszp wrote:Oh, and I was just copying 800*600*4 bytes of data into video memory (just white pixels). No matter if I use memcpy() or memset(), even with interrupts disabled, it takes a few seconds to copy this single frame.
That works out to about 1 MiB per second, which might be fast (if it's Bochs running on top of an old slow computer) or slow (if it's running on a modern real computer and not inside a virtual machine).

For 'memcpy()'; using "rep movsb" should be a little faster than a simple "*dst++ = *src++" thing (especially for Bochs). Using "rep movsd" should be a little faster than that; but it works on dwords and not bytes and can't be used alone for `memcpy()`. A more complicated `memcpy()` that does bytes until things are aligned on a 4 byte boundary, then switches to "rep movsd", then does bytes for any remainder is likely to be slower for general code (where you're typically copying small amounts of data) and would also be slower than "rep movsd alone because we know we're transferring dwords".

For display memory (which is much slower to access than RAM); avoiding unnecessary writes would probably still make more difference than any of these things.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: Scheduling extremely inefficient

Post by mariuszp »

OK, so with priorities:

If I have a list of processes, each with a priority given as an int (say, -2, -1, 0, 1 or 2), is it efficient enough if I just loop through all those tasks searching for priority=2, then priority=1, etc, or is it probably better to have 5 separate queues for different-priority tasks?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Scheduling extremely inefficient

Post by Combuster »

Your question can be rephrased as a very typical computer science exam question:

Assume you have 100 tasks and 20 priority levels. For each of the above mentioned algorithms, how many steps do you need to take to find the correct task if that task has the lowest priority? What if that task has the highest priority?

Have you tried it?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply