Sorted/ ordered list using red-black b-tree
Sorted/ ordered list using red-black b-tree
Hi,
Is an ordered list the best mechanism to implement a timer list? I've been reading up a bit about red-black b-trees and it seems an interesting proposition...
What I find attractive about it is, according to wikipedia, it seems to have well defined worst case times.
Inserting items into a large sorted list can be a fairly expensive (and non-deterministic) operation.
I'm curous what your opinion is on this matter.
Does anybody have any links to articles etc on efficient timer queue designs?
Dushara
Is an ordered list the best mechanism to implement a timer list? I've been reading up a bit about red-black b-trees and it seems an interesting proposition...
What I find attractive about it is, according to wikipedia, it seems to have well defined worst case times.
Inserting items into a large sorted list can be a fairly expensive (and non-deterministic) operation.
I'm curous what your opinion is on this matter.
Does anybody have any links to articles etc on efficient timer queue designs?
Dushara
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
I don't have a link handy, but I've heard of a technique where you have a sorted list of timers. You sort by a relative time instead of an absolute time, so that the head of the list is always the next timer and whenever its count reaches zero it's time is up.
I think Brendan has implemented this scheme before... maybe he can describe it more coherently (if he's reading this ).
I think Brendan has implemented this scheme before... maybe he can describe it more coherently (if he's reading this ).
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
This is described briefly with a little example on the wiki in the article Blocking Process. As Colonel Kernel said, Brendan always has something interesting to share regarding scheduling, so you might want to search through his posts.
Thanks for the replies...
From my understanding, the problem with this scheme is, you have to traverse the whole list if you need to insert an item that belongs at the end (which is quite expensive).
I think the wiki article is also talking about the same thing.
Dushara
This is what I meant when I said "ordered list" sorry if I didn't express myself clearly.I don't have a link handy, but I've heard of a technique where you have a sorted list of timers
From my understanding, the problem with this scheme is, you have to traverse the whole list if you need to insert an item that belongs at the end (which is quite expensive).
I think the wiki article is also talking about the same thing.
Thanks I'll search his posts...As Colonel Kernel said, Brendan always has something interesting to share regarding scheduling
Dushara
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Ah, I thought you were worried about the time it takes to check when a timer should fire. I guess you're more worried about the scalability of adding new timers (since insertion in a simple list is linear)? How many timers do you expect will be in use typically? If you're optimizing something like this, surely the rest of your OS must be covered in polished chrome by now.dushara wrote:This is what I meant when I said "ordered list" sorry if I didn't express myself clearly.
From my understanding, the problem with this scheme is, you have to traverse the whole list if you need to insert an item that belongs at the end (which is quite expensive).
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Thanks I'll see if that helps me...If you are worried about the time it takes to insert an item, consider using a minimum heap
Yes insertion is the issue.Ah, I thought you were worried about the time it takes to check when a timer should fire. I guess you're more worried about the scalability of adding new timers (since insertion in a simple list is linear)?
What I've implemented is an RTOS kernel and I ignored most of the issues you guys take into consideration (MMU, networking, file systems, etc). I won't call it chrome, but I'm quite happy with it still... (You would have seen my announcement no doubt).If you're optimizing something like this, surely the rest of your OS must be covered in polished chrome by now.
Once again thanks for the advice.
Dushara
If you're considering implementing red-black trees, I would also point to you http://en.wikipedia.org/wiki/AVL_Tree. AVL trees are very similar in most respects, in that the insertion, deletion and lookup times are the same. The difference is the algorithm used to optimise the tree (stopping it becoming a long linked list in the worst case), and I personally feel the AVL algorithm is easier to understand/implement.
JamesM[/url]
JamesM[/url]
Re: Sorted/ ordered list using red-black b-tree
Hi,
If "removeEntry" is only used in the timer IRQ handler with interrupts disabled, and "insertEntry" is only used outside of IRQ handlers with interrupts enabled, then it's likely that the time "removeEntry" takes will be much more important than the time "insertEntry" takes.
Next, don't be fooled by "number of operations" (or "big O" notation). For example, an "O(n)" algorithm with poor cache locality might be much worse than an "O(n^2)" algorithm with good cache locality for all possible values of "n".
Then there's re-entrancy - if you're in the middle of inserting a new entry in the list and get a timer IRQ what are you going to do? For single CPU systems you could disable IRQs for all code that touches the list, but that won't work for SMP. To make it work with SMP you can't use a lock alone (the IRQ handler would cause deadlocks) and you can't use any kind of lock that causes task switches inside an IRQ handler. You could disable interrupts and use some kind of spinlock. The problem with that is there's no way to tell how long the IRQ handler would need to wait before it acquires the lock - for e.g. you could have 15 CPUs trying to get the lock to insert something and a timer IRQ at the same time, and the IRQ handler could be the last one to acquire the lock.
Then there's the quantizing effect of the timer. If the timer IRQ occurs every 10 ms, then you could have a list like "6 ms, 1 ms, 8 ms, 15 ms, 18 ms, 12 ms" and it won't matter. Threads don't need to be in strict order - you only need to sort threads according to the time quantum they wake up in.
Lastly, your kernel should guarantee that a thread will not wake up before it's time expires. This means that (for e.g.) if the timer IRQ occurs every 10 ms and a thread wants to sleep for 1 ms, then you don't know if a timer IRQ will occur 2 us after you put the thread on the list, and need to make it wake up after 11 ms to guarantee that it doesn't wake up sooner than 1 ms.
Based on all of the above, it makes sense to me to have a seperate unsorted list for each time quantum. The nice thing with this (assuming that your kernel does guarantee a thread won't wake up too early) is that the timer IRQ can check the "soonest" list without caring about new threads being added to the list. The problem here is you'd need a huge number of lists (a total of "maximumSleepTime / timeQuantum" lists), but we can "cheat" by using modulo and leaving things on the list if they aren't meant to wake up yet.
For an example (using 65536 singly linked lists):
If I didn't stuff something up (it's untested) this is entirely multi-CPU safe. Inserting an entry is "lock free" (but not "block free").
The timer IRQ handler is "block free". Worst case is many threads sleeping for a very long time and repeatedly being checked, but to be honest I wouldn't worry much about that. With a 10 ms timer IRQ, any thread that sleeps for less than 10.9 minutes (655.36 seconds) is only checked once when it needs to wake up.
Cheers,
Brendan
No....dushara wrote:Is an ordered list the best mechanism to implement a timer list? I've been reading up a bit about red-black b-trees and it seems an interesting proposition...
If "removeEntry" is only used in the timer IRQ handler with interrupts disabled, and "insertEntry" is only used outside of IRQ handlers with interrupts enabled, then it's likely that the time "removeEntry" takes will be much more important than the time "insertEntry" takes.
Next, don't be fooled by "number of operations" (or "big O" notation). For example, an "O(n)" algorithm with poor cache locality might be much worse than an "O(n^2)" algorithm with good cache locality for all possible values of "n".
Then there's re-entrancy - if you're in the middle of inserting a new entry in the list and get a timer IRQ what are you going to do? For single CPU systems you could disable IRQs for all code that touches the list, but that won't work for SMP. To make it work with SMP you can't use a lock alone (the IRQ handler would cause deadlocks) and you can't use any kind of lock that causes task switches inside an IRQ handler. You could disable interrupts and use some kind of spinlock. The problem with that is there's no way to tell how long the IRQ handler would need to wait before it acquires the lock - for e.g. you could have 15 CPUs trying to get the lock to insert something and a timer IRQ at the same time, and the IRQ handler could be the last one to acquire the lock.
Then there's the quantizing effect of the timer. If the timer IRQ occurs every 10 ms, then you could have a list like "6 ms, 1 ms, 8 ms, 15 ms, 18 ms, 12 ms" and it won't matter. Threads don't need to be in strict order - you only need to sort threads according to the time quantum they wake up in.
Lastly, your kernel should guarantee that a thread will not wake up before it's time expires. This means that (for e.g.) if the timer IRQ occurs every 10 ms and a thread wants to sleep for 1 ms, then you don't know if a timer IRQ will occur 2 us after you put the thread on the list, and need to make it wake up after 11 ms to guarantee that it doesn't wake up sooner than 1 ms.
Based on all of the above, it makes sense to me to have a seperate unsorted list for each time quantum. The nice thing with this (assuming that your kernel does guarantee a thread won't wake up too early) is that the timer IRQ can check the "soonest" list without caring about new threads being added to the list. The problem here is you'd need a huge number of lists (a total of "maximumSleepTime / timeQuantum" lists), but we can "cheat" by using modulo and leaving things on the list if they aren't meant to wake up yet.
For an example (using 65536 singly linked lists):
Code: Select all
;Insert a new thread on a list of sleeping threads
;
;Input
; ebx timer tick to wake up
; edi address of thread's data area (TDA)
insertThreadOnList:
push *stuff*
*disable task switches*
mov [edi+TDA.wakeTime],ebx ;Set the actual time this thread should wake
movzx ebx,bx ;ebx = list number (or "wakeTime % 65536")
mov eax,[timerListTable+ebx*4] ;eax = current "last thread on list"
.retry:
mov [edi+TDA.nextThreadTDA],eax ;Link the previous thread's TDA to the new thread's TDA
lock cmpxchg [timerListTable+ebx*4],edi ;Try to set a new "last thread on list"
je .retry ;Retry if "last thread on list" changed
*enable task switches*
pop *stuff*
ret
Code: Select all
;The timer IRQ handler
; Note: I assume this uses an interrupt gate (disable tasks switches if it doesn't)
timerIRQ:
push *stuff*
;Adjust the current time
inc dword [currentTime],eax ;Increase the time counter
mov ebp,[currentTime] ;ebp = the current time
;Get the next list of sleeping threads
xor edx,edx
mov ebx,[nextTimerListNumber] ;ebx = next timer list number
lock xchg [timerListTable+ebx*4],edx ;Get the old thread (if any) and clear the list
xor ecx,ecx ;ecx = address for last thread on the "still sleeping" list (none)
;Start processing the list
jmp .doneThread
;Do the next thread on the list
.nextThread:
mov edx,[eax+TDA.nextThreadTDA] ;edx = address for next thread on the list
mov [eax+TDA.wakeTime],ebp ;Should this thread wake up yet?
jbe .wakeThread ; yes
mov [edi+TDA.nextThreadTDA],eax ; no, put the thread on the "still sleeping" list
mov ecx,eax ; and set a new "end of sleeping list"
jmp .doneThread
.wakeThread:
call wakeThread ;Tell the scheduler to wake up this thread
;Check if all threads are done
.doneThread:
mov eax,edx ;eax = address of next thread in list
test eax,eax ;Is everything done?
jne .nextThread ; no, keep going
;Check if there's any threads that are still sleeping on this list
test ecx,ecx ;Are there any threads still sleeping?
je .done ; no, all done
;Merge the remaining sleeping threads with the current list
mov eax,ecx ;eax = last thread on the list
lock xchg [timerListTable+ebx*4],eax ;Get the old thread (if any) and set a new thread
mov [ecx+TDA.nextThreadTDA],eax ;Link the previous thread's TDA to the new thread's TDA
;Update the next list number
.done:
inc bx ;ebx = next timer list number (wraps around at 65536)
mov [nextTimerListNumber],ebx ;Set the new list number for next IRQ
;Everything done - do an EOI and return
*send EOI*
pop *stuff*
iretd
The timer IRQ handler is "block free". Worst case is many threads sleeping for a very long time and repeatedly being checked, but to be honest I wouldn't worry much about that. With a 10 ms timer IRQ, any thread that sleeps for less than 10.9 minutes (655.36 seconds) is only checked once when it needs to wake up.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re: Sorted/ ordered list using red-black b-tree
With a sorted list of timers, wouldn't removeEntry be trivial? You'd always be removing the head of the list...Brendan wrote:If "removeEntry" is only used in the timer IRQ handler with interrupts disabled, and "insertEntry" is only used outside of IRQ handlers with interrupts enabled, then it's likely that the time "removeEntry" takes will be much more important than the time "insertEntry" takes.
I was with you until you said "all possible values of 'n'". It may be true for "today's practical values of 'n'", but those values change over time. Also, the difference between n and n^2 as n increases is a whopper... comparing n with n log n might be more appropriate.Next, don't be fooled by "number of operations" (or "big O" notation). For example, an "O(n)" algorithm with poor cache locality might be much worse than an "O(n^2)" algorithm with good cache locality for all possible values of "n".
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
This is not true. Yes, an O(n^2) algorithm might perform better then a O(n) for some values [0, X[ but since cache misses are constant time the above statement does not hold for n > X.Next, don't be fooled by "number of operations" (or "big O" notation). For example, an "O(n)" algorithm with poor cache locality might be much worse than an "O(n^2)" algorithm with good cache locality for all possible values of "n".
Change possible to reasonable.
[Edit]
Sorry, missed Colonel Kernel's reply...