Page 1 of 1

Hash functions

Posted: Sun Oct 02, 2011 6:04 pm
by NickJohnson
I'm going back through a lot of my C library code in attempts to optimize it (I used a lot of brute force and ignorance the first time through, for ease of coding,) which so far seems like it would be best done by replacing a lot of linked lists with hash tables. The hash tables I did use were also a bit... lacking... with respect to hash functions (they used integer keys, and the hash function was just modulo a prime.)

So, to make my life easier, and easier for programmers who want to use my OS, I decided to try and write a library of non-cryptographic hash functions (which would be, at the moment, part of the C library.) However, there are a lot of different hash functions on the web, and even with various "benchmarks" published by different people (benchmarks which tend to contradict each other,) it's very difficult to figure out which hash functions are good and which ones are bad. Even what qualifies as "good" and "bad" is difficult to determine, considering that both speed (for short and long keys) and low collisions are factors.

I would like to know what sort of experience people here have had with different hash functions, so I can try and compile a list of ones to implement. Even though I've been using prime-sized tables, it seems much more convenient to use power-of-two sized tables, which should in theory be fine as long as the hash function is good. So, it's important to know which hash functions perform well on power-of-two sized tables and which don't.

In addition, I would like to know if people here would appreciate a hash function library like this being released separately from my OS. The only non-crypto hash function library I could find was this one, which is under an unfortunate copyleft and GPL-incompatible license. I would be releasing this under an OpenBSD license, unless something even more liberal would be useful.

Re: Hash functions

Posted: Mon Oct 03, 2011 1:22 am
by Owen
NickJohnson wrote:The only non-crypto hash function library I could find was this one, which is under an unfortunate copyleft and GPL-incompatible license. I would be releasing this under an OpenBSD license, unless something even more liberal would be useful.
Its worth noting that the CPL is a file level copyleft license, much like the MPL, and therefore not viral from mere use of the licensed source code. It is also worth noting that all CPL code is also available under the terms of the EPL, as the CPL contains an explicit "Or later versions" clause in the license text, and IBM granted the right to release new versions of the CPL to the Eclipse Foundation.

This all not withstanding, it is GPL incompatible.

Re: Hash functions

Posted: Mon Oct 03, 2011 6:15 am
by NickJohnson
Huh, I guess I didn't notice that.

Well, either way, I'm going to be implementing a few hash functions myself, so I can understand how and why they work better, and because that other library seems to only implement simple multiplicative and rotative hashing schemes, which don't include some of the algorithms that claim to be "the best" (like lookup3.)

Re: Hash functions

Posted: Wed Oct 26, 2011 1:39 pm
by nulik
NickJohnson wrote:I'm going back through a lot of my C library code in attempts to optimize it ...
I have searched for a lot of hashing functions , and I have picked this one to be implemented (not yet done) in my own OS:

Code: Select all

unsigned djb_hash ( void *key, int len )
{
  unsigned char *p = key;
  unsigned h = 0;
  int i;
  for ( i = 0; i < len; i++ )
      h = 33 * h + p[i];
  return h;
}
Reasons? Very simple , small and reasonable collision rate. There are others than are lower in collisions but they have a big code. Big code will remove code from L1 cache and make the overall performance worse. Also big code takes slighly more to load.

Also Bernstein has a good reputation, he produces good stuff, he made qmail server which no hacker has been able to exploit...