Hash functions
Posted: Sun Oct 02, 2011 6:04 pm
I'm going back through a lot of my C library code in attempts to optimize it (I used a lot of brute force and ignorance the first time through, for ease of coding,) which so far seems like it would be best done by replacing a lot of linked lists with hash tables. The hash tables I did use were also a bit... lacking... with respect to hash functions (they used integer keys, and the hash function was just modulo a prime.)
So, to make my life easier, and easier for programmers who want to use my OS, I decided to try and write a library of non-cryptographic hash functions (which would be, at the moment, part of the C library.) However, there are a lot of different hash functions on the web, and even with various "benchmarks" published by different people (benchmarks which tend to contradict each other,) it's very difficult to figure out which hash functions are good and which ones are bad. Even what qualifies as "good" and "bad" is difficult to determine, considering that both speed (for short and long keys) and low collisions are factors.
I would like to know what sort of experience people here have had with different hash functions, so I can try and compile a list of ones to implement. Even though I've been using prime-sized tables, it seems much more convenient to use power-of-two sized tables, which should in theory be fine as long as the hash function is good. So, it's important to know which hash functions perform well on power-of-two sized tables and which don't.
In addition, I would like to know if people here would appreciate a hash function library like this being released separately from my OS. The only non-crypto hash function library I could find was this one, which is under an unfortunate copyleft and GPL-incompatible license. I would be releasing this under an OpenBSD license, unless something even more liberal would be useful.
So, to make my life easier, and easier for programmers who want to use my OS, I decided to try and write a library of non-cryptographic hash functions (which would be, at the moment, part of the C library.) However, there are a lot of different hash functions on the web, and even with various "benchmarks" published by different people (benchmarks which tend to contradict each other,) it's very difficult to figure out which hash functions are good and which ones are bad. Even what qualifies as "good" and "bad" is difficult to determine, considering that both speed (for short and long keys) and low collisions are factors.
I would like to know what sort of experience people here have had with different hash functions, so I can try and compile a list of ones to implement. Even though I've been using prime-sized tables, it seems much more convenient to use power-of-two sized tables, which should in theory be fine as long as the hash function is good. So, it's important to know which hash functions perform well on power-of-two sized tables and which don't.
In addition, I would like to know if people here would appreciate a hash function library like this being released separately from my OS. The only non-crypto hash function library I could find was this one, which is under an unfortunate copyleft and GPL-incompatible license. I would be releasing this under an OpenBSD license, unless something even more liberal would be useful.