Page 1 of 1

Bucket Sorting

Posted: Wed Feb 23, 2011 7:15 am
by i586coder
Hi friends,

I just written an implementation of Bucket Sorting for strings,
i need your opinions to improve it.

Code: Select all


struct Node {
    char  *Data;
    struct Node *Next;
};


struct Node * BucketSort(struct Node *Array){
// Time Complexity: Best Case : Theta N
//               Average Case : Theta N
//                 Worst Case : Theta N*N (ie Insertion Sort)
    #define size 26 // a-z
    struct Node **buckets, *sortedList=NULL;
    int i,k;

	buckets=(struct Node **)malloc(sizeof(struct Node *) * size);

	for(i=0;i<size;++i){ // Init Buckets
        buckets[i]=NULL;
    }

	///////// Drop Items into Buckets ///////////////
	struct Node *p = Array ;
    while(p){
        int pos;
        pos = p->Data[0]-'a' ; // <----------
        buckets[ pos ] = creatNode( p->Data , buckets[pos] );
        p=p->Next;
    }

	//////// Sort Buckets using Insertion Sort /////////////
	for(i=0;i<size;++i){
	    if(buckets[i])
            buckets[i]=InsertionSort(buckets[i]);
    }

	///// concatenate the buckets ////////
    for(i=0;i<size;++i){
        struct Node *currentBucket = buckets[i] ;
	    while( currentBucket ){
            sortedList = creatNode( currentBucket->Data , sortedList);
            currentBucket = currentBucket->Next;
	    }
    }


	//Release All Memory Allocted
	for(i=0;i<size;++i){
             nodeFree( buckets[i] );
        }
        free(buckets);


    return sortedList ;
}

I think

Code: Select all

        pos = p->Data[0]-'a' ; // <----------
this needs some optimize, the idea is to sorting an array of strings by divided them
to buckets ; each bucket has a label with alphabetical letter. e.g bucket #1 holds "a" strings.

Cheer$,
a.T.D

Re: Bucket Sorting

Posted: Wed Feb 23, 2011 8:46 am
by Brendan
Hi,

Have you considered what happens if there's an empty string or the first character of the string isn't 'a' to 'z'?

If there's 3 million strings that all happen to start with 'a', then how would those buckets help?

Have you considered using some form of binary tree?


Cheers,

Brendan

Re: Bucket Sorting

Posted: Wed Feb 23, 2011 9:41 am
by turdus
If I were you I wouldn't use bucket sort. The best practice is to keep strings sorted, and use quick search to find where to insert a new string. Some time ago I had to wrote a session correlator, and I tested and benchmarked several sorting algorithms. Sessions where identified by session-id (a string), and I had to look up it very fast (there was about 1 million open session per sec), so it was quite a challange.
1. quick sort is slow in _most_ cases (it's only quick for a limited set of input)
2. bucket sort requires too much memory, and it's not so fast as I thought
3. no sorting algorithm can compete the "keep sorted and search on insert" way.
4. never ever use glib's bsearch, it's a buggy implementation

Re: Bucket Sorting

Posted: Wed Feb 23, 2011 12:32 pm
by i586coder
Thanx guys for your valuable replies, in fact am using this algorithm, because it is a distribution sort, so i can use multiprocessors for sorting.

@Brendan: you are completely right, in this case i should split up a very long series of string to sub-buckets !!!

@turdus: i can't keep string sorted, because i have to put some pre-knowable job for specific processor e.g buckets with even number are for core #2 ...

@berkus: I'm trying to mixup some theories with practice some times :)

Cheer$
a.T.d