qsort or alternative sorting function

paulbarker · Post by **paulbarker** » Mon Aug 07, 2006 3:12 am

Well heres my take on a solution: test if theres still stack space before each qsort step. If the stack is full, rather than splitting the array and sorting each segment, just do insertion sort on the lot.

This means you can use a stack-size which is fixed and reasonably small, yet still handle an array of any size.

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 07, 2006 3:13 am

So that'd mean you could have a qsort function that first tries a quicksort and fallback to e.g. bubble sort if it appears that the 20-entries stack won't be enough ?

the fact the implementation was widespread doesn't necessarily means it doesn't have ugly limlitation (widely known as well by qsort gurus) that you'll work around in one way or another ...

Solar · Post by **Solar** » Mon Aug 07, 2006 3:22 am

@ paulbarker:

That would blow your performance into the nether hells. But the idea of "capping" recursion depth has merit. Actually, I have a paper here ("Introspective Sorting and Selection Algorithms") that suggests using an algorithm with O(n log n) worst-case performance (like heapsort) as fallback, so that the qsort() function as a whole performs at O(n log n) even in worst case.

(You wouldn't want to use Heapsort right away as Quicksort is much faster in most cases.)

This approach is called "Introsort", and it's what I will use for PDCLib v1.0 (or rather, v0.5), combined with some improvements on pivot-handling. The upcoming bugfix release (v0.4.1) will simply recurse instead of using an internal stack - not good in terms of performance, but it's an easy fix that didn't cost me much time.

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 07, 2006 6:31 am

Solar wrote: This approach is called "Introsort", and it's what I will use for PDCLib v1.0 (or rather, v0.5), combined with some improvements on pivot-handling. The upcoming bugfix release (v0.4.1) will simply recurse instead of using an internal stack - not good in terms of performance, but it's an easy fix that didn't cost me much time.

... or recursively call the stack-featured sort function when its own stack is full ... that way you naturally keep thing fast when the array is nice and gracefully degrade performance (towards N?, i admit) due to the recursion when you can no longer handle things with a simple "in-function" stack.

(yet the "fall back to heap sort" is indeed a nicer option

)

Solar · Post by **Solar** » Mon Aug 07, 2006 7:02 am

The performance degrade towards O(n^2) happens anyway; it's "only" the function call overhead for the larger subpartition you can save with the internal stack.

And I don't want to waste too much effort on intermediate solutions in the pre-alpha phase.

nick8325 · Post by **nick8325** » Mon Aug 07, 2006 11:06 am

Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.

It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.

First of all, let's work out how to do it using a compiler with tail call elimination instead of using an explicit stack. The call to quicksort will partition the array into less-than-pivot and greater-than-pivot parts, and then recursively call quicksort twice. The first call will be recursive (will use CPU stack space) and the second call will be tail-recursive (a loop).

The maximum stack use will be the maximum nesting depth of the first (non-tail-recursive) call.

It doesn't matter which order you sort the partitions in. If you recursively sort the smaller partition, and tail-recursively sort the bigger partition, the size of the array will (at worst) halve on each recursion. So the recursion depth will be at most log n.

Now, a lot of C compilers don't do tail call elimination. So that's no good. Tail recursion is just a loop, though, so

Code: Select all

f(x) = if b then return f(y) else return z

is equivalent to

Code: Select all

f(x) = while not b do x := y done; return z

. That will get you an O(log n) stack space, explicitly recursive version of qsort.

Or we can introduce tail recursion elimination by using an explicit call stack instead of using the CPU's one. Then:

Popping a value off the stack corresponds to a function return.
Pushing a value on the stack corresponds to a function call. When the function returns, it will try to sort the array again that produce that function call. That's not so useful. So...
Replacing the value on the top of the stack and pushing another value corresponds to a recursive call followed by a tail-call - it will sort the top array, and when that returns, it will sort the next array. Then it will return. The tail call doesn't use any extra stack space, which is what we need. It's a call with the return address to something else, I suppose.

With that, it's easy enough to translate a recursive version of qsort() into an iterative one with an explicit stack (just push things instead of calling functions). To get O(log n) stack use, when we recurse we need to push the bigger partition then the smaller one - that corresponds to recursively sorting the smaller array and then tail-recursively sorting the bigger one.

PDCLib's qsort does keep a call stack, more or less. The stack isn't just [tt]char * stack[/tt], though - the top of the stack is kept in [tt](base_, limit)[/tt] and rest of the stack is [tt]stack[/tt]. With that in mind, we can translate qsort back to a recursive function and see what it does:

To begin with, the call stack will contain one entry - the whole array (in base_ and limit).
It goes in a loop, examining the top of the stack (base_ and limit). So it uses the call stack properly.
If the array is small enough, it sorts it and returns.
Otherwise, it partitions the array and causes a recursive call to sort the smaller partition then a tail recursion to sort the bigger partition.
It returns once the call stack is empty.

So your version of qsort will use at most O(log n) call stack entries already

Actually, here's a shorter argument off the top of my head: I'll prove that (for all k) the kth element of the stack (counting from the bottom of the stack) will have size < 2^(STACKSIZE - k). That's counting (base_, limit) as a member of the stack, too. This is an invariant of your qsort(), so we have to check that it holds to begin with and that the loop makes sure it's true.

To begin with, there's one element. By making STACKSIZE big enough, the invariant will be true.

In the insertion sort case, POP(base_, limit) will just remove the top element of the stack (the base_, limit one). So that will preserve the invariant.

In the recursive case, let's say the top-of-stack was originally element k. It will partition element k, set element k to the bigger partition and set element (k+1) to the smaller partition. The new element k can't be bigger than the old element k (it's a partition of it), and element (k+1) will be at most half the size of element k (i.e. it will have size < 2^(STACKSIZE - (k+1))). So that will preserve the invariant too.

So the invariant will always be true. That means that the entry at STACKSIZE - 1 will have size < 2, so size 0 or 1. An array of size 0 or 1 doesn't use the recursive case, so you'll never push anything when the stack is full.

Anyway, I seem to have gone on a bit

I hope that makes you a bit more confident of the qsort() in PDCLib's SVN...

paulbarker · Post by **paulbarker** » Mon Aug 07, 2006 11:18 am

Maybe its just me, but my second rule at the moment is "No recursive functions in the kernel" (excepting things which are bounded at around 2-3 recursions and use very little stack space).

Incase you're wondering, the first is "If it's not thread safe, re-write it" (hence I'm replacing psnprintf with my own printf backend).

As far as performance goes, with the proposed fix I'm looking at the case where the stack is full and theres nowhere else to go. It's designed to never actually be used, but to guarentee that we can sort any number of elements in a bounded amount of stack space.

BTW, my stacks will be 4k growing to 12k max on x86. But on some architectures, every frame counts, and I always like my generic functions to work in every possible case on all platforms.

Solar · Post by **Solar** » Mon Aug 07, 2006 3:18 pm

nick8325 wrote: So your version of qsort will use at most O(log n) call stack entries already

<funny>

You are saying that log(n) stack space is enough, which would mean I can sort 2^64 elements no problem because log( 2^64 ) == 19.{something}, which is <= 20.

But I never studied CS. I am just a lowly programmer who learned his stuff hands-on, you know? All those O(1) and O(n) and O(n^2) and O(log n) and O(n log n) just make my head swim.

So I took a more practical approach, hacked up a piece of code that generates a perverted progression of integers some paper claimed to be "a median-of-3-killer", and blew that 20-element stack with an array of 9219 elements... ;D

</funny>

The test driver:

Code: Select all

#define KILLSIZE 9219

int main( void )
{
    int killer[KILLSIZE * 2];
    for ( int i = 0; i < KILLSIZE * 2; ++i )
    {
        if ( i >= KILLSIZE )
        {
            killer[i] = ( i - KILLSIZE ) * 2 + 1;
        }
        else
        {
            if ( i % 2 )
            {
                killer[i] = KILLSIZE + i - 1;
            }
            else
            {
                killer[i] = i;
            }
        }
    }
    qsort( killer, KILLSIZE * 2, sizeof( int ), compint );
    return TEST_RESULTS;
}

Add an [tt]assert( (stackptr - stack) <= 20 );[/tt] at the end of the qsort loop, and watch it blow for yourself.

Don't try to figure out the "why", the algorithm is going to be rewritten anyway.

nick8325 · Post by **nick8325** » Mon Aug 07, 2006 3:29 pm

Solar wrote: Add an [tt]assert( (stackptr - stack) <= 20 );[/tt] at the end of the qsort loop, and watch it blow for yourself.

Yes, that's because that assertion checks that the stack has less than 10 items in it

Code: Select all

assert(stackptr < &stack[STACKSIZE]);

doesn't fail for me.

EDIT: anyway, log(2^64) = 64, at least for deciding what STACKSIZE should be...

B.E · Post by **B.E** » Mon Aug 07, 2006 5:18 pm

nick8325 wrote:
Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.
It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.

I don't udersand that proberbility crap. but isn't O(log n) at best case scenrio.

Solar · Post by **Solar** » Mon Aug 07, 2006 10:40 pm

nick8325 wrote: Yes, that's because that assertion checks that the stack has less than 10 items in it

*slap*

::)

Sure, you're right.

Let's rest this case. Between mystran's and paulbarkers' initial complaint and your assertions that STACKSIZE 64 would be OK, my trust in that code has eroded. v0.4.1 will get a temporary recursive solution, v0.5 an Introsort with proper worst-case and domain checking done in the test driver, and that should be the end of it.

(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)

Colonel Kernel · Post by **Colonel Kernel** » Mon Aug 07, 2006 11:45 pm

Solar wrote:(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)

I think that's supposed to say log[sub]2[/sub](2[sup]64[/sup]) = 64...

Pype.Clicker · Post by **Pype.Clicker** » Tue Aug 08, 2006 7:13 am

ask your calculator for "(log 64) / (log 2)" and it should happily report "42" ... err i mean "64" (sorry).

nick8325 · Post by **nick8325** » Tue Aug 08, 2006 10:18 am

Solar wrote: Let's rest this case. Between mystran's and paulbarkers' initial complaint and your assertions that STACKSIZE 64 would be OK, my trust in that code has eroded. v0.4.1 will get a temporary recursive solution, v0.5 an Introsort with proper worst-case and domain checking done in the test driver, and that should be the end of it.

Fair enough.

Colonel Kernel wrote:
Solar wrote:(How do you figure log(2^64) == 64? My calculator gives me 19.{something} for log, 44.{something} for ln...)
I think that's supposed to say log[sub]2[/sub](2[sup]64[/sup]) = 64...

Yes, that's it. Sorry, I tend to leave off the base when it's 2...

Candy · Post by **Candy** » Tue Aug 08, 2006 11:00 am

nick8325 wrote:
Solar wrote: The problem is that this is for the average case. As is quite well-known, Quicksort has a weakness that can result in O(n^2) behaviour - which a 20-element stack cannot handle.
It can result in O(n^2) time used, that's for sure, but I think it can be done in O(log n) stack space.

Yes, if you're careful. You then need to check which of the halves is smaller, recurse for the smaller and tailrecurse for the larger. If you do that (and only if, I'm tempted to say) you'll keep the stack within O(log(n)) of the item count, with a constant factor for the size of the stack frame.

Good luck in getting it to work.

Solar: Nobody can beat mathematical limits. Not even you. The only way you can get around them is by misunderstanding them or misinterpreting them.

OSDev.org

qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function

Re:qsort or alternative sorting function