It was the weirdest thing. When I would run the kernel on QEMU with 128MB of RAM, my kernel would page fault when trying to allocate memory for the stack space when starting a process. I was requesting a single page from the block allocator for the process's stack space. When I would increase the memory in the system to 512MB, no page fault. In the end, I convinced myself that it was down to the way the kernel was linked, that my calculations for the kernel size were failing somewhere, so I set off on my two year adventure to write a new linker map and kernel entry routine that would set up paging and the block allocator correctly. When I started pulling pieces of code today, I decided to step through and see if I could take another crack at the page fault issue when I found this little gem:
Code: Select all
// block_alloc_first_free_n (): Locate the first free n blocks in the bitmap
// inputs: nunits - number of blocks to allocate, returns - 0 if no free blocks, nonzero otherwise
static unsigned int block_alloc_first_free_n (size_t nunits) {
if (nunits == 0)
return (unsigned int) NULL;
else if (nunits == 1)
return block_alloc_first_free ();
printf("got here... block_alloc_first_free_n\n");
for (size_t i = 0; i < bmp_sz; i++) {
printf("looping...\n");
if (m_bmp[i] != BLOCK_UNIT_FULL) {
printf("not full: %d\n",i);
for (size_t j = 0; j < BLOCKS_PER_UNIT; j++) {
printf("looping 2...\n");
int bit = 1 << j;
if (!(m_bmp[i] & bit)) {
printf("i: %d, j: %d\n", i, j);
printf("free: 0x%x\n", bit);
printf("free abs: 0x%x\n", (unsigned int)((i*BLOCKS_PER_UNIT)+j));
int startBit = (i*BLOCKS_PER_UNIT)+bit;
size_t free = 0;
for (size_t count = 0; count <= nunits; count++) {
printf("looping 3... testing 0x%x\n", (startBit+count));
if (memory_bitmap_test(startBit+count) == 0) {
printf("free++\n");
free++;
}
if (free == nunits) {
return (unsigned int)((i*BLOCKS_PER_UNIT)+j);
}
}
}
}
}
}
return 0;
}
So I think there are two important lessons out of this. The first is already stated very plainly on the wiki "How to ask questions" section. Don't assume that you know which section of your code is causing the problem. I spent MONTHS dumping page tables, examining memory, pounding my head trying to figure this out because I was looking in the wrong section (convinced it was in the setup routines, that the issue was coming from improper configuration when it was a simple logic error). The second is, sometimes when you are stuck, taking a break from what you are working on for a while (two stinking years in my case) can help you come at the problem with fresh eyes. Not sure if this post will help anyone, but I am excited to have found my issue after all this time.