Page 1 of 1

Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 1:31 am
by Arnab35
Hi all,

I have been trying to implement sleep properly in my kernel for the last few days. I have tried various mechanisms and apparently none of them seem to be working at this point. Basically, my sleep is a system call that basically runs an idle loop till the number of sleep seconds expires. I am using the PIT timer to keep decrementing a global variable that tells me if the timer has expired/not. I have declared ticks globally.

Code: Select all

int ticks = 0;
This is the PIT irq handler that I have implemented and which seems to be running correctly.

Code: Select all

void i86_pit_irq(){
  i86_set_mask(1);

  // sleeping processes in queue
  ticks++;
  i86_pic_send_eoi_command(32);
  struct pcb *process = sleeping_queue.processes;
  while(process != NULL)
  {
    if(process->sleep_time == 0)
    {
      try_to_wake_up(&sleeping_queue, process);
    }
    else
    {
      process->sleep_time = process->sleep_time - 1;
    }
    process = process->next_in_queue;
  }
  struct pcb *current = get_current_process();
  if(current->is_sigkill_pending) // current is current process
  {
    /* usually involves killing the process- signals are always handled by current process */
    current->is_sigkill_pending = 0;
    sys_exit(9);  // signal code KILL -9
  }
  else if(current->is_sigsegv_pending)
  {
    current->is_sigsegv_pending = 0;
    sys_exit(11);   // signal code : 11
  }
  current->time_slice = current->time_slice - 1;
  if(current->time_slice <= 0)
  {
    schedule();
  }
  i86_clear_mask(1);
}
The below is my sleep system call --

Code: Select all

void sys_sleep(unsigned long seconds)
{
  unsigned int timer_ticks;
  timer_ticks = ticks + (seconds * 1000);
  /*
  kprintf("sleeping for %d seconds ", seconds);*/
  while(ticks < timer_ticks);
  kprintf("sleep done");
}
Seconds is the amt of time in milliseconds for which the system should sleep.

The problem here is that the sys_sleep function does not return. It basically keeps running the while loop even after the condition has become false (when ticks > timer_ticks).

I tried using gdb to see in assembly which instruction is getting executed repeatedly. I noticed a peculiar jmp instruction.

Code: Select all

0xffffffff80201567 <sys_sleep+13>               add    (%rax),%edi
0xffffffff80201569 <sys_sleep+15>               cmp    (%rax),%edi
0xffffffff8020156b <sys_sleep+17>               jbe    0xffffffff8020156f <sys_sleep+21>
0xffffffff8020156d <sys_sleep+19>               jmp    0xffffffff8020156d <sys_sleep+19> 


The fourth jump instruction seems to be the culprit as it is jumping to the same address repeatedly and hence the system is going to an infinite loop.

I am quite confused as to what a proper sleep implementation would be. Is my current implementation correct ? If so, what is causing this bug ? Should I change my design in any way ?

Thanks.

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 2:24 am
by iansjack
I suspect the problem is that the compiler has optimized out the test. (Perhaps it sees "timer_ticks = ticks + ..." and so thinks that the comparison always succeeds.) Try compiling with no optimization.

I won't comment on the design, other than to say that running a loop like this is very inefficient. You should make use of the HLT instruction.

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 2:32 am
by davidv1992
Compiler optimizations are indeed the most likely culprit if the IRQ handler is running. The proper solution to guarantee that the compiler is not making assumptions on the value of ticks is to declare ticks to be volatile. This will tell the compiler that the variable can change in ways it cannot predict.

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 2:43 am
by Arnab35
Yes @davidv1992, you were right. Turning the global ticks into a volatile variable worked in this case. I will try using "hlt" in between the loop now.

Thanks for the help everyone..

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 6:33 am
by Brendan
Hi,

Just thought I'd offer some suggestions (assuming the original problem was fixed).

You can expect a read or write to a legacy IO port will costs about 1 microsecond. This code must access at least three IO ports and therefore costs at least 3 microsecond of CPU time:

Code: Select all

void i86_pit_irq(){
  i86_set_mask(1);
  ...
  i86_pic_send_eoi_command(32);

   ....

  i86_clear_mask(1);
}
Note that 3 microseconds is about 9000 cycles on a modern 3 GHz CPU, and is like doing a thousand multiplications.

This code has similar behaviour but only costs 1 microsecond:

Code: Select all

void i86_pit_irq(){

   ....

  i86_pic_send_eoi_command(32);
}
For IRQ0 (the highest priority IRQ for PIC chips); the difference between these approaches is that the former allows the PIC chip to send lower priority IRQs while the interrupt handler is running, while the latter doesn't allow the PIC chip to send lower priority IRQs while the interrupt handler is running. For something like this I'd expect that IRQs are disabled for the entire thing to make sure that (e.g.) an IRQ doesn't occur while you're messing with the scheduler's data structures and/or doesn't occur while you're in the middle of doing a task switch. If IRQs are disabled for this code then the behaviour of both alternatives is identical (except for performance/overhead).

Note: As far as I know (and I could be wrong here); the "mask and EOI; then unmask latter" thing is something that Minix originally did, because it was a micro-kernel and they wanted to deliberate break the PIC chip's IRQ proirity scheme (and had device drivers in "user-space" and wanted to use the scheduler's priorities instead of the PIC chip's priorities). However; Minix was designed for very old CPUs (e.g. 8086) that were significantly slower (e.g. 5 MHz CPU clock) and where IO port accesses weren't much slower than any other instruction; so it wasn't incredibly bad for Minix originally. Sadly, Minix was used for teaching for a long time, so when CPUs got faster people were taught to do something incredibly bad for newer/faster CPUs.

This code shows a misunderstanding of how "sleep()" is supposed to work:

Code: Select all

void sys_sleep(unsigned long seconds)
{
  unsigned int timer_ticks;
  timer_ticks = ticks + (seconds * 1000);
  /*
  kprintf("sleeping for %d seconds ", seconds);*/
  while(ticks < timer_ticks);
  kprintf("sleep done");
}
How "sleep()" is supposed to work is that the thread is put on some kind of "things to wake up when their time expires" list for the timer and the scheduler is told "do not give this task any CPU time"; and then when the timer wakes the task it tells the scheduler "hey, this task can have CPU time again now". In other words, when correctly implemented, "sleep()" does not need any "while(ticks < timer_ticks);" loop, and the scheduler is not wasting any CPU time doing task switches between tasks that are sleeping.

More specifically; because there are many reasons for tasks to "block" (to have to wait for something - e.g. wait for file IO, wait for time, wait for a message, ...) I'd strongly recommend having some kind of "block_current_task(reason)" that blocks the current task and does a task switch to a different task that isn't blocked, and an "unblock_task(task_ID)" that unblocks the selected task (if it was blocked) and decides if the scheduler should do an immediate task switch to the unblocked task and then may or may not do a task switch immediately.

In this case "sleep()" would be implemented more like:

Code: Select all

void sys_sleep(unsigned long seconds)
{
  unsigned int wake_tick;
  wake_tick = ticks + (seconds * 1000);
  add_task_to_timer_list(current_task_ID, wake_tick);
  block_current_task(SLEEP);           // Block this task until the timer IRQ unblocks it
}
Of course the timer's IRQ handler would have to be more like:

Code: Select all

void i86_pit_irq() {
  tick++;

  while(first_entry_in_timer_list->wake_tick <= tick) {
    unblock_task(first_entry_in_timer_list->task_ID);
    remove_first_entry_in_list();
  }

  i86_pic_send_eoi_command(32);
}
Next; sooner or later it's likely that you'll want to support other types of timers (HPET, local APIC timer), and sooner or later it's likely that you'll want to support "tickless" (to avoid the compromise between timer accuracy and overhead that "fixed frequency tick" causes). For this reason I'd recommend using a more abstract "nanoseconds since boot" instead of using "tick". For example:

Code: Select all

void i86_pit_irq() {
  nanoseconds_since_boot += nanoseconds_per_tick;
And:

Code: Select all

void sys_sleep(unsigned long seconds)
{
  uint64_t wake_time;
  wake_time = nanoseconds_since_boot + (seconds * 1000000000);
And:

Code: Select all

void sys_nanosleep(unsigned long nanoseconds)
{
  uint64_t wake_time;
  wake_time = nanoseconds_since_boot + nanoseconds);
Note that (regardless of what you use) you should worry about roll-over; and if you use nanoseconds then it will roll-over sooner than ticks will if the variable/s are left the same size. For examples; if "tick" is a 32-bit integer and is incremented every 1 millisecond then it will roll over after about 49.7 days; and if "nanoseconds_since_boot" is a 64-bit integer then it will roll over after about 213504 days. I'd make sure that it takes an extremely long time before it rolls over; and then I'd have something that forces the OS to shut down and reboot after a configurable amount of time where that configurable amount of time can only be configured to be less than however long it takes for the timer to roll over.

Finally; for code that needs to do something regularly, "sleep()" is inadequate. For example, if you want to do something once per second you might write this:

Code: Select all

  for(;;) {
    sleep(1);
    do_something();
  }
The problem with this is that if "do_something()" happens to take 123 ms (including however long it takes for scheduler to give the task CPU time after it wakes up), then "do_something" will be called every 1.123 seconds and won't be called every second. To fix that problem you need something more like this:

Code: Select all

  when = now + ONE_SECOND;
  for(;;) {
    sleep_until(when);
    when += ONE_SECOND;
    do_something();
  }
In this case "do_something()" will be called every second regardless of how long it takes to execute "do_something()".

If you have a "nano_sleep_until()" function then it can be the basis for everything else. For example:

Code: Select all

void sys_sleep(unsigned long seconds)
{
  uint64_t wake_time;
  wake_time = nanoseconds_since_boot + (seconds * 1000000000);
  sys_nano_sleep_until(wake_time);
}

void sys_nano_sleep(unsigned long nanoseconds)
{
  uint64_t wake_time;
  wake_time = nanoseconds_since_boot + nanoseconds;
  sys_nano_sleep_until(wake_time);
}

void sys_sleep_until(unsigned long wake_time_in_seconds)
{
  uint64_t wake_time;
  wake_time = wake_time_in_seconds * 1000000000;
  sys_nano_sleep_until(wake_time);
}
Of course all of these functions should also care about roll over (and should be considered buggy until you add something like "if(wake_time < now) { // Must have rolled over, so we should sleep "forever" because the OS will be rebooted before the task can wake up" logic).


Cheers,

Brendan

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 12:18 pm
by AMenard
1- Lie down in a quiet and dark environment.
2- relax and take a few deep breath.
3- let your brain chemistry do the rest.
4- ???
5- zzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 12:26 pm
by Schol-R-LEA
I think that there is a misunderstanding of ideas here. Perhaps we need to differentiate 'sleeping', which is when a process is removed from the process scheduling for a period of time, and 'idling', which is when the CPU itself is inactive. What you describe is idling, but you seem to be using it for causing processes to sleep, which will be a Problem for you later on.

A sleeping process shouldn't be run at all while it is sleeping; rather, it just needs to be held out of the 'running' process queue of the scheduler until the timer is reached.

The process of testing the timers generally should be done by the scheduler, which needs to keep track of when the sleeping processes can be woken. Fortunately, there are a few efficient ways to do this, the simplest being a 'delta queue', which is a simple linked list with nodes consisting of the process IDs and relative time offsets (either in time units or CPU ticks, your choice).

Code: Select all

typedef struct PSLEEP {
    Process* sleeping_process;
    time_offset delta;
    PSLEEP* next;
} PSleep_Node;

typedef PSleep_Node* Sleep_Queue;
You can simply add the offset and the link pointer to the Process records directly,instead, but that trades off needing an additional record type for added overhead in all process records. For now I'll write it this way to avoid clutter.

When a process sleeps, the time offset from the time it is inserted to when it can be woken is computed. I am assuming that the PSleep_Node has been allocated already, the PID set, and the initial offset computed. I am writing this recursively, because I am brain-damaged that way, but you can do it iteratively if you want. Also, this assumes that you can just subtract the deltas when scheduling and decrement it on a tick.

Code: Select all

void schedule_sleep(Sleep_Queue queue, PSleep_node* dog)   // bad pun, I know, I know
{
    time_offset delta_new = dog->delta;

    // start by checking of the queue is empty
    if (queue == NULL) 
    {
        queue = dog;
     }
     else if (queue->delta  <= dog->delta)
     {
         dog->next = queue;
         queue->delta = dog->delta - queue->delta;
     }
     else if (queue->next == NULL)
     {
         queue->next = dog;
     }
     else 
         dog->delta -= queue->delta;
         schedule_sleep(queue->next, dog);
}

In this code, the scheduler walks the list, subtracting the offset of the new node from that of the nodes as it goes down (retaining the modified offset from the previous subtraction) until it finds one that would be negative or zero. It then sets the node's offset to the last non-negative offset it found and inserts it in front of the the last tested node (the one which would have given a negative offset).

This means that when you go to tick the clock (assuming a tick-based model), you simply decrement the head element of the sleep queue and test to see if it is zero. You then move the process record from the sleep queue to running queue, as well as any other processes immediately following the head whose offsets (relative to the head) are zero.

Code: Select all

    // inside the scheduler, following a clock tick.
    // The variable 'sleep_q' is a pointer to the head of the sleep queue.
    sleep_q->delta--;
    while (sleep_q->delta == 0)
    {
        insert(running_q, sleep_q->sleeping_process);
        sleep_q = sleep_q->next);
    }

   //  ... continue scheduling
Let's take a quick look at how this works. Let's insert a sleep of 5 clock ticks to the sleep queue. The sequence of offsets is now:

Code: Select all

6
When the clock ticks, we decrement that head offset, to get:

Code: Select all

5
Let's assume that after two more clock ticks (decremented as above), another process calls for a sleep of 5 ticks at the third tick. We get:

Code: Select all

2, 3
Now at the next tick, we get a sleep of 3. This gives us:

Code: Select all

1, 3, 0
at the next tick, the head element is woken up, and the one following it becomes the new head:

Code: Select all

3, 0
One tick later, another sleep of seven is added.

Code: Select all

2, 0, 5
One tick more, and we get:

Code: Select all

1, 0, 5
Another tick later, the head, and the one following it with a zero offset, get moved to running_q:

Code: Select all

5
Then on the next tick a sleep of 3 is called, leading to the sequence:

Code: Select all

3,1  < -- note that the '1' is the element that was already in the queue: 
        5 - 1 - 3 = 1
2,1
1,1
1
Finally, the last sleeping process is woken and the sleep queue is empty again.

This is just a quick sketch of the technique, the actual code will depend on the rest of your kernel design. Note that I didn't address the issue of the thread's priority on being woken up; depending on the scheduler algorithm and the CPU load, there is no guarantee that the woken process will run next without additional steps being taken (and if two or more processes are woken together, at least one of them will need to wait unless there are available CPU cores to schedule them to).

Re: Correct Implementation of Sleep

Posted: Fri Mar 02, 2018 4:29 pm
by Arnab35
Hi Brendan and Schol-R-LEA,

Thanks for the information. I have been building this OS as part of my university coursework and of course, as of now it supports a few binaries and is just about able to run its own shell and 6 commands on it.

I would agree that my sleep process design is flawed and as of now I am implementing preemption and to some extent, and adding the notion of foreground and background process to it. I was confused when I ran sleep on the terminal and perfectly imagined that the entire system goes to a state of 'idling' and not 'sleeping' because I did not see any activity on the console and perfectly assumed that everything had stopped.

I believe once I separate out this notion of background and foreground processes, I would try scheduling a background process when the foreground process (i.e. the terminal process) calls sleep. I believe that would be a proper way of scheduling some process in the background given I do a sleep on the terminal. Other than that, if any other process goes to sleep, I would invoke the scheduler to see if any other process is ready to schedule in the ready queue and put the sleeping process into a sleep queue. Then as you both said, I will invoke a timer( currently I have only implemented PIT) to keep track of when to wake up the process.

@Brendan, I actually faced an issue when I placed the send_EOI_command at the end of my timer interrupt handler. As you correctly predicted, placing the EOI acknowledgement at the end, probably led to task switches in between when the scheduler ready queues were being tampered with, so eventually the EOI command was not sent and every time I returned to my shell, the keyboard interrupts (as well as probably the timer) stopped working. So I realised that the end-of-interrupt signal should ideally be sent very early or rather, before running the risk of task switches, in order to allow the PIC to keep receiving more interrupts in future. I will also look into the performance aspect of my code later as I was only interested to make it work first.

Let me know if I still made some conceptually incorrect assumptions about my design. Thanks to everyone who cleared it up for me.

Re: Correct Implementation of Sleep

Posted: Sat Mar 03, 2018 3:15 pm
by Schol-R-LEA
Arnab35 wrote:I believe once I separate out this notion of background and foreground processes, I would try scheduling a background process when the foreground process (i.e. the terminal process) calls sleep. I believe that would be a proper way of scheduling some process in the background given I do a sleep on the terminal.
You want to be careful here; the concepts of 'background' and 'foreground' relate to the user interface, not to scheduling per se, though giving a process which is actively communicating with a user ('foreground') a higher priority isn't too uncommon (though doing so naively has some pitfalls).

Rather, I would recommend focusing on how the kernel schedules the processes. It is entirely possible to use co-operative multitasking rather than pre-emptive multitasking, but it requires the various processes to explicitly cede control to the scheduler to avoid lock-ups. Since the main mechanism for pre-emption is a clock interrupt (either on a fixed 'tick', or as explicitly scheduled by the scheduler for a 'tickless' kernel), and on most hardware systems you will need to handle general interrupts anyway, there is usually little reason not to support pre-emption for a general OS.

Scheduling is it's own rather hairy topic, so you'd have to read up on that, but unless you have something specific in mind you probably would want to just have some variant of round-robin scheduling (with or without priorities) and tweak it later.