Suggested memory testing techniques?

josephjah · Post by **josephjah** » Wed Aug 09, 2006 9:18 am

In addition to the title, currently I'm using a memory testing method mentioned here http://www.netrino.com/Articles/MemoryT ... ?the_id=49 right now I'm only working on testing the data bus, but it is soooo sloooww... omg, like over a minute on a 1GHz, 192MB of RAM

I'm just looping from the address at the end of my kernel 0x1133D1 to the end of the extended memory 0xBDF0000 which is 198036527 iterations.... how can i go about testing memory faster?

gaf · Post by **gaf** » Thu Aug 10, 2006 4:04 am

Hello,
the trick is actually quite obvious: Just count in greater steps. If you, for example, choose steps of one MiB, there'll only be 192 iterations for your memory, and even the worst-case scenario (4096 iterations) still shouldn't seems feasible. In case that you need a higher prdecision, there's still the possibility to increase probing resolution at the boarders of memory regions.

You should however note that direct memory probing may cause some problems, as it's possible that memory regions are interspersed with memory mapped devices. What happends when a value is written to such a device is unpredictable. In theory you might even damage your hardware as you accidentally give the device a rather unhealthy command..

The mega-tokyo FAQ contains an article about this topic (link) that might provide you with further informaton.

regards,
gaf

josephjah · Post by **josephjah** » Fri Aug 11, 2006 8:02 pm

Um, can you elaborate? because I wouldn't be sure how to do that with the method you discribed because it would be testing the same amount of memory, but only sharing iterations between functions... now, I'm familiar with the link you provided, but i can't see how it has any relevance to memory testing... basically what I'm doing is splitting memory into bytes and writing a 1 to the 1st bit, then walking that 1 up to the 8th bit, testing to make sure that the bit gets stored correctly, that it reads correctly, and that it clears correctly.

I think you just misunderstood my question... I know how much memory the system has, and my kernel knows its boundries, I just need to make sure that all of the memory above my kernel works properly. Thanks in advance

gaf · Post by **gaf** » Sat Aug 12, 2006 8:43 am

I think you just misunderstood my question...

You're right. I actually thought you were talking about memory probing, which is a much more common topic on OS boards..

I know how much memory the system has, and my kernel knows its boundries, I just need to make sure that all of the memory above my kernel works properly.

As failures are quite rare, there's normally no need to test all memory every time the system gets started. If a memory module is really broken, the system gets unstable and starts crashing frequently. Only then it really makes sense to run a memory test in order to find out what causes the problems.

Basically what I'm doing is splitting memory into bytes and writing a 1 to the 1st bit, then walking that 1 up to the 8th bit, testing to make sure that the bit gets stored correctly, that it reads correctly, and that it clears correctly.

The article actually propose running three seperate tests: data - address - integrity. While the first two are really fast (32 iteration per memory module), the last test takes a while as it has to access all memory words. There's no way to optimize that testing algrithm without making it less reliable - the only thing you can do is implementing it in a more efficient way..

The way you describe your implementation, it sounds to me as if you did a whole walking-bit cycle for each byte. This actually isn't necessary as you only want to check if the memory can hold a value. Whether the data-bus bits are working flawlessly has already been checked by the first test.

What the article proposes, is that you first fill all memory words with some value (in the example code memory[x] = x is used) and then read the values back to see if they're still there (this has to be done in a second run in order to make sure that the values haven't just been floating on the bus). You then repeat the procedure once more, this time initializing the memory with the opposit of what you've used in the first run (memory[x] = ~n). This second step is necessary to make sure that some bits aren't just stuck high/low and happend to be in the right position during the first run.

Note that you can also improve the performance of your algorithm by rewriting it to work on 32bit values rather than bytes. This cuts down the number of iterations needed to 1/4 th of the original value..

regards,
gaf

matthias · Post by **matthias** » Sat Aug 12, 2006 4:14 pm

gaf wrote:
What the article proposes, is that you first fill all memory words with some value (in the example code memory[x] = x is used) and then read the values back to see if they're still there (this has to be done in a second run in order to make sure that the values haven't just been floating on the bus). You then repeat the procedure once more, this time initializing the memory with the opposit of what you've used in the first run (memory[x] = ~n). This second step is necessary to make sure that some bits aren't just stuck high/low and happend to be in the right position during the first run.

Something like this??

taken from my post (http://www.osdev.org/phpBB2/viewtopic.php?t=2814)

Code: Select all

puts("testing memory... "); // just 1 chek, slows boot, but better to prevent crashes ;)
   // the more the memory the more time it takes to boot ;)

   addr_t* i = (addr_t*)Bitmap + (PagesInUse * 4096);

   // first loop
   for(; i < (addr_t*)MemoryEnd; i = i + 4)
   {
      *i = (addr_t)i;

      if(*i != (addr_t)i)
      {
         printf("error @ 0x%x loop 1", (addr_t)i);
         asm("hlt");
      }
   }

   addr_t* j = (addr_t*)Bitmap + (PagesInUse * 4096);
   i = (addr_t*)Bitmap + (PagesInUse * 4096);
   // second loop
   for(; j < (addr_t*)MemoryEnd; j = j + 4, i = i + 4)
   {
      *i = ~*i;

      if(*i != ~(addr_t)j) // chek with real answer
      {
         printf("error @ 0x%x loop 2", (addr_t)i);
         asm("hlt");
      }
   }

gaf · Post by **gaf** » Sun Aug 13, 2006 6:12 am

That's roughly the idea, although there's a small bug in the code:

Code: Select all

for(init i=MemoryStart; i < (addr_t*)MemoryEnd; i = i + 4) 
{
  *i = (addr_t)i; 

  if(*i != (addr_t)i) 
  {
      panic();
  }
}

If you check the value right after you've written it, it might happen that you can get it back although the memory isn't working. This is possible as the value might still float on the data bus after the write. To quote from the article that josephjah has posted:

"Since each read (verify) occurs immediately after a write, it is possible that the value read back represents only the voltage remaining on the data bus. If each read occurs closely after the corresponding write, it may appear that the value has been correctly stored--even though there is no memory at the other end! To detect this problem, we need only alter the test slightly. Instead of performing a read after each write, we write all the test data first, then read the data back one byte at a time."

If you need some more details, you might have a look at the pseudo code in section 3.3 of the article. Also remember that you have to disabled all processor chaching before running the code (you'll have to reset some bits in cr0)..

regards,
gaf

matthias · Post by **matthias** » Sun Aug 13, 2006 6:15 am

Yeah I know about the bug I changed it now. When I think off disabling cpu cache and clear some cr0 bits I prefer scrapping it out of my kernel, too much work

:p

gaf · Post by **gaf** » Sun Aug 13, 2006 6:58 am

Yeah, it's like overkill and stuff

:

Code: Select all

// write-back and invalidate the cache
__asm__ __volatile__ ("wbinvd");

// plug cr0 with just PE/CD/NW
// cache disable(486+), no-writeback(486+), 32bit mode(386+)
 __asm__ __volatile__("movl %%eax, %%cr0", ::
                      "a" (cr0 | 0x00000001 | 0x40000000 | 0x20000000) : "eax");

(code was taken from the mega-tokyo FAQ)

cheers,
gaf

matthias · Post by **matthias** » Sun Aug 13, 2006 7:02 am

Ok maybe it is not