I think you just misunderstood my question...
You're right. I actually thought you were talking about memory probing, which is a much more common topic on OS boards..
I know how much memory the system has, and my kernel knows its boundries, I just need to make sure that all of the memory above my kernel works properly.
As failures are quite rare, there's normally no need to test all memory every time the system gets started. If a memory module is really broken, the system gets unstable and starts crashing frequently. Only then it really makes sense to run a memory test in order to find out what causes the problems.
Basically what I'm doing is splitting memory into bytes and writing a 1 to the 1st bit, then walking that 1 up to the 8th bit, testing to make sure that the bit gets stored correctly, that it reads correctly, and that it clears correctly.
The article actually propose running three seperate tests: data - address - integrity. While the first two are really fast (32 iteration per memory module), the last test takes a while as it has to access all memory words. There's no way to optimize that testing algrithm without making it less reliable - the only thing you can do is implementing it in a more efficient way..
The way you describe your implementation, it sounds to me as if you did a whole walking-bit cycle for each byte. This actually isn't necessary as you only want to check if the memory can hold a value. Whether the data-bus bits are working flawlessly has already been checked by the first test.
What the article proposes, is that you first fill all memory words with some value (in the example code memory[x] = x is used) and then read the values back to see if they're still there (this has to be done in a second run in order to make sure that the values haven't just been floating on the bus). You then repeat the procedure once more, this time initializing the memory with the opposit of what you've used in the first run (memory[x] = ~n). This second step is necessary to make sure that some bits aren't just stuck high/low and happend to be in the right position during the first run.
Note that you can also improve the performance of your algorithm by rewriting it to work on 32bit values rather than bytes. This cuts down the number of iterations needed to 1/4 th of the original value..
regards,
gaf