qsort or alternative sorting function
Re:qsort or alternative sorting function
Well heres my take on a solution: test if theres still stack space before each qsort step. If the stack is full, rather than splitting the array and sorting each segment, just do insertion sort on the lot.
This means you can use a stack-size which is fixed and reasonably small, yet still handle an array of any size.
This means you can use a stack-size which is fixed and reasonably small, yet still handle an array of any size.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:qsort or alternative sorting function
So that'd mean you could have a qsort function that first tries a quicksort and fallback to e.g. bubble sort if it appears that the 20-entries stack won't be enough ?
the fact the implementation was widespread doesn't necessarily means it doesn't have ugly limlitation (widely known as well by qsort gurus) that you'll work around in one way or another ...
the fact the implementation was widespread doesn't necessarily means it doesn't have ugly limlitation (widely known as well by qsort gurus) that you'll work around in one way or another ...
Re:qsort or alternative sorting function
@ paulbarker:
That would blow your performance into the nether hells. But the idea of "capping" recursion depth has merit. Actually, I have a paper here ("Introspective Sorting and Selection Algorithms") that suggests using an algorithm with O(n log n) worst-case performance (like heapsort) as fallback, so that the qsort() function as a whole performs at O(n log n) even in worst case.
(You wouldn't want to use Heapsort right away as Quicksort is much faster in most cases.)
This approach is called "Introsort", and it's what I will use for PDCLib v1.0 (or rather, v0.5), combined with some improvements on pivot-handling. The upcoming bugfix release (v0.4.1) will simply recurse instead of using an internal stack - not good in terms of performance, but it's an easy fix that didn't cost me much time.
That would blow your performance into the nether hells. But the idea of "capping" recursion depth has merit. Actually, I have a paper here ("Introspective Sorting and Selection Algorithms") that suggests using an algorithm with O(n log n) worst-case performance (like heapsort) as fallback, so that the qsort() function as a whole performs at O(n log n) even in worst case.
(You wouldn't want to use Heapsort right away as Quicksort is much faster in most cases.)
This approach is called "Introsort", and it's what I will use for PDCLib v1.0 (or rather, v0.5), combined with some improvements on pivot-handling. The upcoming bugfix release (v0.4.1) will simply recurse instead of using an internal stack - not good in terms of performance, but it's an easy fix that didn't cost me much time.
Every good solution is obvious once you've found it.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:qsort or alternative sorting function
... or recursively call the stack-featured sort function when its own stack is full ... that way you naturally keep thing fast when the array is nice and gracefully degrade performance (towards N?, i admit) due to the recursion when you can no longer handle things with a simple "in-function" stack.Solar wrote: This approach is called "Introsort", and it's what I will use for PDCLib v1.0 (or rather, v0.5), combined with some improvements on pivot-handling. The upcoming bugfix release (v0.4.1) will simply recurse instead of using an internal stack - not good in terms of performance, but it's an easy fix that didn't cost me much time.
(yet the "fall back to heap sort" is indeed a nicer option )
Re:qsort or alternative sorting function
The performance degrade towards O(n^2) happens anyway; it's "only" the function call overhead for the larger subpartition you can save with the internal stack.
And I don't want to waste too much effort on intermediate solutions in the pre-alpha phase.
And I don't want to waste too much effort on intermediate solutions in the pre-alpha phase.
Every good solution is obvious once you've found it.
Re:qsort or alternative sorting function
It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.
First of all, let's work out how to do it using a compiler with tail call elimination instead of using an explicit stack. The call to quicksort will partition the array into less-than-pivot and greater-than-pivot parts, and then recursively call quicksort twice. The first call will be recursive (will use CPU stack space) and the second call will be tail-recursive (a loop).
The maximum stack use will be the maximum nesting depth of the first (non-tail-recursive) call.
It doesn't matter which order you sort the partitions in. If you recursively sort the smaller partition, and tail-recursively sort the bigger partition, the size of the array will (at worst) halve on each recursion. So the recursion depth will be at most log n.
Now, a lot of C compilers don't do tail call elimination. So that's no good. Tail recursion is just a loop, though, so
Code: Select all
f(x) = if b then return f(y) else return z
Code: Select all
f(x) = while not b do x := y done; return z
Or we can introduce tail recursion elimination by using an explicit call stack instead of using the CPU's one. Then:
- Popping a value off the stack corresponds to a function return.
- Pushing a value on the stack corresponds to a function call. When the function returns, it will try to sort the array again that produce that function call. That's not so useful. So...
- Replacing the value on the top of the stack and pushing another value corresponds to a recursive call followed by a tail-call - it will sort the top array, and when that returns, it will sort the next array. Then it will return. The tail call doesn't use any extra stack space, which is what we need. It's a call with the return address to something else, I suppose.
PDCLib's qsort does keep a call stack, more or less. The stack isn't just [tt]char * stack[/tt], though - the top of the stack is kept in [tt](base_, limit)[/tt] and rest of the stack is [tt]stack[/tt]. With that in mind, we can translate qsort back to a recursive function and see what it does:
- To begin with, the call stack will contain one entry - the whole array (in base_ and limit).
- It goes in a loop, examining the top of the stack (base_ and limit). So it uses the call stack properly.
- If the array is small enough, it sorts it and returns.
- Otherwise, it partitions the array and causes a recursive call to sort the smaller partition then a tail recursion to sort the bigger partition.
- It returns once the call stack is empty.
Actually, here's a shorter argument off the top of my head: I'll prove that (for all k) the kth element of the stack (counting from the bottom of the stack) will have size < 2^(STACKSIZE - k). That's counting (base_, limit) as a member of the stack, too. This is an invariant of your qsort(), so we have to check that it holds to begin with and that the loop makes sure it's true.
To begin with, there's one element. By making STACKSIZE big enough, the invariant will be true.
In the insertion sort case, POP(base_, limit) will just remove the top element of the stack (the base_, limit one). So that will preserve the invariant.
In the recursive case, let's say the top-of-stack was originally element k. It will partition element k, set element k to the bigger partition and set element (k+1) to the smaller partition. The new element k can't be bigger than the old element k (it's a partition of it), and element (k+1) will be at most half the size of element k (i.e. it will have size < 2^(STACKSIZE - (k+1))). So that will preserve the invariant too.
So the invariant will always be true. That means that the entry at STACKSIZE - 1 will have size < 2, so size 0 or 1. An array of size 0 or 1 doesn't use the recursive case, so you'll never push anything when the stack is full.
Anyway, I seem to have gone on a bit I hope that makes you a bit more confident of the qsort() in PDCLib's SVN...
Re:qsort or alternative sorting function
Maybe its just me, but my second rule at the moment is "No recursive functions in the kernel" (excepting things which are bounded at around 2-3 recursions and use very little stack space).
Incase you're wondering, the first is "If it's not thread safe, re-write it" (hence I'm replacing psnprintf with my own printf backend).
As far as performance goes, with the proposed fix I'm looking at the case where the stack is full and theres nowhere else to go. It's designed to never actually be used, but to guarentee that we can sort any number of elements in a bounded amount of stack space.
BTW, my stacks will be 4k growing to 12k max on x86. But on some architectures, every frame counts, and I always like my generic functions to work in every possible case on all platforms.
Incase you're wondering, the first is "If it's not thread safe, re-write it" (hence I'm replacing psnprintf with my own printf backend).
As far as performance goes, with the proposed fix I'm looking at the case where the stack is full and theres nowhere else to go. It's designed to never actually be used, but to guarentee that we can sort any number of elements in a bounded amount of stack space.
BTW, my stacks will be 4k growing to 12k max on x86. But on some architectures, every frame counts, and I always like my generic functions to work in every possible case on all platforms.
Re:qsort or alternative sorting function
<funny>nick8325 wrote: So your version of qsort will use at most O(log n) call stack entries already
You are saying that log(n) stack space is enough, which would mean I can sort 2^64 elements no problem because log( 2^64 ) == 19.{something}, which is <= 20.
But I never studied CS. I am just a lowly programmer who learned his stuff hands-on, you know? All those O(1) and O(n) and O(n^2) and O(log n) and O(n log n) just make my head swim.
So I took a more practical approach, hacked up a piece of code that generates a perverted progression of integers some paper claimed to be "a median-of-3-killer", and blew that 20-element stack with an array of 9219 elements... ;D
</funny>
The test driver:
Code: Select all
#define KILLSIZE 9219
int main( void )
{
int killer[KILLSIZE * 2];
for ( int i = 0; i < KILLSIZE * 2; ++i )
{
if ( i >= KILLSIZE )
{
killer[i] = ( i - KILLSIZE ) * 2 + 1;
}
else
{
if ( i % 2 )
{
killer[i] = KILLSIZE + i - 1;
}
else
{
killer[i] = i;
}
}
}
qsort( killer, KILLSIZE * 2, sizeof( int ), compint );
return TEST_RESULTS;
}
Don't try to figure out the "why", the algorithm is going to be rewritten anyway.
Every good solution is obvious once you've found it.
Re:qsort or alternative sorting function
Yes, that's because that assertion checks that the stack has less than 10 items in itSolar wrote: Add an [tt]assert( (stackptr - stack) <= 20 );[/tt] at the end of the qsort loop, and watch it blow for yourself.
Code: Select all
assert(stackptr < &stack[STACKSIZE]);
EDIT: anyway, log(2^64) = 64, at least for deciding what STACKSIZE should be...
Re:qsort or alternative sorting function
I don't udersand that proberbility crap. but isn't O(log n) at best case scenrio.nick8325 wrote:It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.
Re:qsort or alternative sorting function
*slap*nick8325 wrote: Yes, that's because that assertion checks that the stack has less than 10 items in it
::)
Sure, you're right.
Let's rest this case. Between mystran's and paulbarkers' initial complaint and your assertions that STACKSIZE 64 would be OK, my trust in that code has eroded. v0.4.1 will get a temporary recursive solution, v0.5 an Introsort with proper worst-case and domain checking done in the test driver, and that should be the end of it.
(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)
Every good solution is obvious once you've found it.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:qsort or alternative sorting function
I think that's supposed to say log[sub]2[/sub](2[sup]64[/sup]) = 64...Solar wrote:(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:qsort or alternative sorting function
ask your calculator for "(log 64) / (log 2)" and it should happily report "42" ... err i mean "64" (sorry).
Re:qsort or alternative sorting function
Fair enough.Solar wrote: Let's rest this case. Between mystran's and paulbarkers' initial complaint and your assertions that STACKSIZE 64 would be OK, my trust in that code has eroded. v0.4.1 will get a temporary recursive solution, v0.5 an Introsort with proper worst-case and domain checking done in the test driver, and that should be the end of it.
Yes, that's it. Sorry, I tend to leave off the base when it's 2...Colonel Kernel wrote:I think that's supposed to say log[sub]2[/sub](2[sup]64[/sup]) = 64...Solar wrote:(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)
Re:qsort or alternative sorting function
Yes, if you're careful. You then need to check which of the halves is smaller, recurse for the smaller and tailrecurse for the larger. If you do that (and only if, I'm tempted to say) you'll keep the stack within O(log(n)) of the item count, with a constant factor for the size of the stack frame.nick8325 wrote:It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.
Good luck in getting it to work.
Solar: Nobody can beat mathematical limits. Not even you. The only way you can get around them is by misunderstanding them or misinterpreting them.