(Fixed) Memory access bug

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

(Fixed) Memory access bug

Post by Octacone »

Hi,

I've encountered a bug and I'm out of ideas.
To sum it up:
  • My OS uses 32 bit PAE paging, 4 KB pages
  • Accessing (reading/writing) anything from 0x800000 to 0x81E000 causes a page fault
  • Error code 0x0B (Protection violation (page present), page written, reserved violation, kernel mode)
  • Only happens on real hardware (no problems with emulators)
  • Only happens when using -O0, with -O2 it disappears
  • My error label gets corrupted -> "Page Fault" turns into some random chars and symbols. (memory corruption perhaps?)
Here is an example that causes a crash. (not this exact code, this is a simplified version, but it boils down to this):

Code: Select all

   uint32_t* test = (uint32_t*) 0x81E000;
	test[0] = 0xCAFEBEEF;
Any random guesses, have you ever encountered anything similar?
Last edited by Octacone on Mon Oct 14, 2019 8:16 am, edited 1 time in total.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1798
Joined: Wed Aug 30, 2017 8:24 am

Re: Memory access bug

Post by nullplan »

Reserved violation? That means one of the relevant page translation entries has a reserved bit set.

It appears that something is corrupting your paging structures. This also explains why your constants aren't working anymore: The page translation entries for those might have been corrupted. But if the code continues to work then the paging entries for the code are still working. So there's a hint. Where is the kernel code and where are the kernel data? And are you putting your strings into a write-protected section or not?

Do you use write protection in kernel mode? If not, it might be worth it to start.
Carpe diem!
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Memory access bug

Post by Octacone »

nullplan wrote:Reserved violation? That means one of the relevant page translation entries has a reserved bit set.

It appears that something is corrupting your paging structures. This also explains why your constants aren't working anymore: The page translation entries for those might have been corrupted. But if the code continues to work then the paging entries for the code are still working. So there's a hint. Where is the kernel code and where are the kernel data? And are you putting your strings into a write-protected section or not?

Do you use write protection in kernel mode? If not, it might be worth it to start.
I know what a reserved violation means I'm just unable to relate it to my code.
There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.
It does continue to work since the only thing wrong is that string at that given moment (crash screen).
My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1798
Joined: Wed Aug 30, 2017 8:24 am

Re: Memory access bug

Post by nullplan »

Octacone wrote:My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?
Are your constants located in a significantly different place than your code? That's what I'm asking here. In my case, .rodata is linked directly following .text, so there's only a few KB difference. If anything made me unable to access the former but not the latter, it would have to touch pretty much only one page table entry, but leave the other PTEs, PDEs, PDPEs and PML4Es untouched.

Write-protection is what you can do with the WP bit in CR0. Once set, even the kernel can no longer write into write protected pages. Which I use after applying alternatives. Then I will know if anything tried to write into kernel. Otherwise it would just happen.
Carpe diem!
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Memory access bug

Post by iansjack »

Octacone wrote:There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.
Well, of course, there is a way because it is happening. And the fact that your paging code is written in assembler is irrelevant if the code affecting the memory location is elsewhere.

As this only happens on real hardware and not emulators, the most likely cause is that somewhere you are assuming that uninitialized memory is set to 0. If the error happened in an emulator it would be easy to track - run under a debugger and set a watch on the offending memory location - but as it is you are just going to have to inspect your code to try to narrow down the likely cause.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Memory access bug

Post by Octacone »

nullplan wrote:
Octacone wrote:My kernel is located at 1 MB physical, 3 GB virtual.
What do you mean by write-protected?
Are your constants located in a significantly different place than your code? That's what I'm asking here. In my case, .rodata is linked directly following .text, so there's only a few KB difference. If anything made me unable to access the former but not the latter, it would have to touch pretty much only one page table entry, but leave the other PTEs, PDEs, PDPEs and PML4Es untouched.

Write-protection is what you can do with the WP bit in CR0. Once set, even the kernel can no longer write into write protected pages. Which I use after applying alternatives. Then I will know if anything tried to write into kernel. Otherwise it would just happen.
Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

There is something odd doe, take a look at my section headers for a moment, I've noticed this behavior a long time ago, maybe it's correlated:
With -O0:
O0.png
With -O2:
O2.png
Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:
phO0.png
Take a look at those highlighted items, they shouldn't be there.
This is what it should look like (with -O2):
Image
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Memory access bug

Post by Octacone »

iansjack wrote:
Octacone wrote:There just isn't a way for those reserved bits to get set. Also optimization shouldn't affect my paging code since it's all written in assembly and compiled by NASM, but it somehow does.
Well, of course, there is a way because it is happening. And the fact that your paging code is written in assembler is irrelevant if the code affecting the memory location is elsewhere.

As this only happens on real hardware and not emulators, the most likely cause is that somewhere you are assuming that uninitialized memory is set to 0. If the error happened in an emulator it would be easy to track - run under a debugger and set a watch on the offending memory location - but as it is you are just going to have to inspect your code to try to narrow down the likely cause.
It is quite hard to debug things like these because you can't use a debugger or anything. All my paging structures are initialized to zero.
I'm just wondering why would accessing a certain address with -O0 cause a fault and not with -O2. So the offending code cannot be the one written in Assembly, it must be the C++ one.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Memory access bug

Post by iansjack »

I suspect that the faulty code is nothing to do with paging. It just happens to be overwriting the memory used by the page table. It's not uncommon for optimization to reveal a bug that wasn't apparent before. Memory allocation/deallocation routines are a good candidate; a bug here can cause any part of your code to break.
nullplan
Member
Member
Posts: 1798
Joined: Wed Aug 30, 2017 8:24 am

Re: Memory access bug

Post by nullplan »

Octacone wrote:Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:
Oof, let's take these one at a time. So with -O0, your constants are located at 0xc010fXXX, and with -O2 they are at 0xc010aXXX. That's five entries difference in the page table, so your memory corruption may just hit this small range of memory.

You are not using WP, but you might want to reconsider. ATM we don't know if the page tables get corrupted or your constants are written to. I suggest changing your linker script to put a page break between the read-only sections and the read-write sections (as easy as ". += 0x1000" at that point). This should cause your output file to have two LOAD segments, one RX and the other RW. Then map your RX segment with write protection and set the WP bit once you are done modifying any code you may need to and have installed the IDT. Of course, this makes the initial paging a bit more challenging. But not a lot.

The additional .text sections are auto-generated by the C++ compiler when it instantiates a template. Perhaps that becomes unnecessary with optimization because the code is unreachable. In any case, there is nothing weird going on with your segments. The .text sections are part of the RX segment.

It is also wrong to say that you cannot use a debugger in that case. You can still use the debugging facilities of the machine in question. In this case, once you have your final page mapping, set watchpoint on the page tables (use the debug registers) and print out any debug exceptions that may occur.
Carpe diem!
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Memory access bug

Post by Octacone »

nullplan wrote:
Octacone wrote:Got you, my .rodata is located right after .text, that's all good.
I don't use the WP bit at all.

Notice with -O0 there are two additional items that shouldn't be there: [2] and [3].
Also there is something weird going on with my program headers as well:
Oof, let's take these one at a time. So with -O0, your constants are located at 0xc010fXXX, and with -O2 they are at 0xc010aXXX. That's five entries difference in the page table, so your memory corruption may just hit this small range of memory.

You are not using WP, but you might want to reconsider. ATM we don't know if the page tables get corrupted or your constants are written to. I suggest changing your linker script to put a page break between the read-only sections and the read-write sections (as easy as ". += 0x1000" at that point). This should cause your output file to have two LOAD segments, one RX and the other RW. Then map your RX segment with write protection and set the WP bit once you are done modifying any code you may need to and have installed the IDT. Of course, this makes the initial paging a bit more challenging. But not a lot.

The additional .text sections are auto-generated by the C++ compiler when it instantiates a template. Perhaps that becomes unnecessary with optimization because the code is unreachable. In any case, there is nothing weird going on with your segments. The .text sections are part of the RX segment.

It is also wrong to say that you cannot use a debugger in that case. You can still use the debugging facilities of the machine in question. In this case, once you have your final page mapping, set watchpoint on the page tables (use the debug registers) and print out any debug exceptions that may occur.
Interesting, why did the compiler choose those two .text sections and not some other random ones?
I did some debugging and found out something strange:
On QEMU my constants are located at 0xC010F2B1 (and contain real data) and on real hardware they're at 0x350046 (and contain garbage data)! Like how! That is impossible they should be +3GB higher (at least).
There is definitely something fishy going on.
There is really not much space when it comes to debugging on real hardware. The only thing I can do is dump registers and memory (if mapped) and that's about it.
What do I gain by using the WP bit?
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
nullplan
Member
Member
Posts: 1798
Joined: Wed Aug 30, 2017 8:24 am

Re: Memory access bug

Post by nullplan »

Octacone wrote:Interesting, why did the compiler choose those two .text sections and not some other random ones?
When a template is instantiated, the compiler will generate a new section with the name indicating what exactly it contains. You can push the name through c++filt to see the cleartext version of it. If multiple CPP files instantiate the same template, their object files will contain the same template text sections. They are marked to only be linked once, so the linker throws all but one instance of the code away.

This allows you to compile and link C++ like you'd do with C. And believe me, it is a blessing. At work I have to deal with a compiler that uses a prelinker. In that system, the compiler will not instantiate templates at all, but rather, the linker is run again and again. The prelinker identifies undefined references to template instances and recompiles certain source files after telling the compiler to instantiate a certain template in there. It takes ages to complete and is extremely fragile in case the source file is no longer available at the time of the final link. Which can happen with libraries, for example.
Octacone wrote:On QEMU my constants are located at 0xC010F2B1 (and contain real data) and on real hardware they're at 0x350046 (and contain garbage data)! Like how! That is impossible they should be +3GB higher (at least).
OK, is your linear mapping not long enough?
Octacone wrote:There is really not much space when it comes to debugging on real hardware. The only thing I can do is dump registers and memory (if mapped) and that's about it.
That's about all you'll ever need. Once you've found the corrupt memory, you can use the debug registers to find out who wrote to it.
nullplan wrote:What do I gain by using the WP bit?
If bad code writes into read-only sections, you get an exception immediately instead of memory corruption that crashes down the line somewhere.
Carpe diem!
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Memory access bug

Post by Octacone »

nullplan wrote:When a template is instantiated, the compiler will generate a new section with the name indicating what exactly it contains. You can push the name through c++filt to see the cleartext version of it. If multiple CPP files instantiate the same template, their object files will contain the same template text sections. They are marked to only be linked once, so the linker throws all but one instance of the code away.

This allows you to compile and link C++ like you'd do with C. And believe me, it is a blessing. At work I have to deal with a compiler that uses a prelinker. In that system, the compiler will not instantiate templates at all, but rather, the linker is run again and again. The prelinker identifies undefined references to template instances and recompiles certain source files after telling the compiler to instantiate a certain template in there. It takes ages to complete and is extremely fragile in case the source file is no longer available at the time of the final link. Which can happen with libraries, for example.
That is interesting, I didn't know this. I will leave it as is.
nullpan wrote:OK, is your linear mapping not long enough?
It is long enough, my kernel is very small and only the first 12 MB or so are mapped.

I'll have to rewrite a large chunk of my paging code in order to use the WP bit. That'll take a while.

Now this is interesting,
I tried mapping some more MB and I went from 12 to 14-ish and something interesting happened.
My debugger suggests that my constants are now located at 0x0 like what the? I can't come up with a way for that to be possible.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

(Fixed) Memory access bug

Post by Octacone »

Fixed!
It had nothing to do with my code, it was my toolchain.
I wanted to check and see if there was something wrong with my USB and BINGO!
I noticed that initialized variables were not initialized and that I could replicate all the faults on my emulator, so that was it.
Looks like mtools suck and don't know how to overwrite files properly so something unexplained happens and stuff gets moved around at random places and whatnot.
Sorry for bothering, but this took me quite a while to figure.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
Post Reply