OSDev.org

Posted: **Fri Mar 14, 2003 12:00 pm**

for minimizing search, I need to write hash functions.

has someone implement one? is it better to work now with linked list and then implement hashtable or tree to improve the searching?

Posted: **Fri Mar 14, 2003 12:37 pm**

if what you want is a hashtable, then a linked list will surely offer poor performances

I did implement a hash (see it in kernel/src/kernel/kds/naming.c or something alike) function, but never tested the complete hashtable. I inspired from the hash function in ELF files specification, and i must admit it gives good results on what it is designed for : symbolic names in a program.

Posted: **Fri Mar 14, 2003 1:22 pm**

The combination of a hash, a btree and a linked list will give very good preformance.

A hash to generate an initial handle for a string.

And then a btree of the hash value get's you to a list of
values within just a few compares.

And them from there out follow a linked list of actual string compares.

I use the following algorithm for a hash and it works good for me:

unsigned long hash = 0;
while ( *ptr )
{
hash+= *ptr++;
if ( * ptr )
hash*= *ptr++;
}

This alternates an add and multiply of the bytes in a
string. While it is not garenteed to be unique it is darn close.

This algorithm works well when the string are similar - say:

/user/local/squid
vs
/usr/local/squide

Even though there are only two character differences, and that difference is just the same character in a different position, the hash values are different.

The most likly collisions are those that are very different - say something like:
/usr/local/squid
vs
/mnt/floppy/foobar

This meens that the string compares usually fail straight away when searching the final linked list.

Posted: **Sat Mar 15, 2003 2:25 am**

take a look at this site. it has a good hashtable example
http://burtleburtle.net/bob/hash/hashtab.html

Posted: **Sat Mar 15, 2003 8:55 am**

Hmm, just been considering an alternative to the hash finding thing.

Here's what I came up with:

Fill a dword with the first 4 bytes of the string

Loop these alternately until the string has been read:

OR the dword with a dword composed of the next 4 bytes.

AND the dword with a dword composed of the next 4 bytes.

Seems to me that should give you a decent chance of uniqueness for value/string and be reasonably quick. Of course you'd have to pad out any section that's shorter than 4 chars with spaces or something.

Posted: **Sat Mar 15, 2003 9:20 am**

I'd be temped to use XOR instead of AND and OR. XOR Say you have a string comprising only characters between 'a' and 'z'. They're going to have a lot of bits in common; AND and OR won't do anything to those bits, so the hash value of the string will depend only on the variability of the other bits. XOR will toggle those bits, so the hash value of the string will depend on the other bits as well as the length (actually, odd or evenness of the length) of the string.

As an extreme example: AND/OR hashing will give the same key for "aaaa" as "aaaaaaaaaaaaaaaa". XOR hashing will give different keys for each string (but then again, it will give the same key for "aaaa" as "aaaaaaaaaaaa").

Posted: **Sat Mar 15, 2003 6:15 pm**

Hmm, tricker problem than I first thought...which of course just makes it more interesting

.

I see the point about XOR, forgot that the ASCII chars would share the majority of their bits. As an improvement (To stop the "aaaa" "aaaaaaaaaaaa" problem) you could make the first value in the loop the length of the string. This would alter it sufficiently that those two strings would no longer be comparable.

Posted: **Sat Mar 15, 2003 7:55 pm**

Here's something I got from a compiler book:

Code: Select all

DATA hashlist[PRIME];

// credit to compilers: principles, techniques and tools
unsigned int hashpjw(char *s, unsigned int h)
{
   char *p;
   int g;
   for (p=s; *p; p++)
   {
      h <<= 4;
      h += (*p);
      if ((g = h&0xF0000000))
         h ^= g>>24 ^ g;
   }
   return h;
}

unsigned int hashprime(char *s)
{
   return hashpjw(s, 0)%PRIME;
}

Apparently it helps a lot if your hash array of data is a prime number. I don't know why though.

There's another format of hashlist though. Using just a huge array (slightly larger than the data intended, which means you need to know how much data before hand), you insert your hashed items as you find them as an index to the array.

If there's a collision (ie, two strings with same hash), instead of a linked list, just jumped a fixed offset (like 10 down) to try and find an empty slot. The idea here is that searching and inserting have the same jumping algorithm.

That way, instead of allocated a new entry for a linked list, you just use the empty slots in your array.

- Nick

OSDev.org

Hash Table ?

Hash Table ?

Re:Hash Table ?

Re:Hash Table ?

Re:Hash Table ?

Re:Hash Table ?

Re:Hash Table ?

Re:Hash Table ?

Re:Hash Table ?