At the entry point of the kernel, _start, TTBR0 contains
Code: Select all
Breakpoint 1, 0xffffffff80001700 in _start ()
(gdb) info r TTBR0_EL1
TTBR0_EL1 0x23ffff000 9663672320
When at the entry point, we can see that 0x9000000 is indeed mapped inside TTBR0. So invalidating TLB cache does nothing; as long as the TTBR0 page tables are accessible, the VA 0x9000000 is successfully translated.
Code: Select all
(qemu) xp/1xg 0x23ffff000
000000023ffff000: 0x000000023fffe003
(qemu) xp/1xg 0x23fffe000
000000023fffe000: 0x000000023fffd003
(qemu) xp/1xg 0x23fffd000+0x48*8
000000023fffd240: 0x0060000009000401
Note that these tables are at the end of the RAM area, which in your memory map is denoted as Available/Conventional memory.
Code: Select all
Info : [KRNL] 000000023f844000 - 000000023fffffff: Available
From what I understood, AllocFrames favours allocating from the end. As and when it allocate frames from the end, at some point, the frames (0x23ffff000, 0x23fffe000, 0x23fffd000, etc), which were part of the TTBR0 page tables setup by the UEFI, get consumed.
Before the point of their consumption, the VA 0x9000000 can be translated, since TTRB0 still contains 0x23ffff000 and the tables are intact. Thus, the SerialPrinting works for some time.
After the point of their consumption, the frames have their contents changed. TTBR0 still contains 0x23ffff000, but the page table chain isn't valid anymore. Hence it is very likely that the VA 0x9000000 can't be translated anymore.
Code: Select all
(qemu) xp/1xg 0x23ffff000
000000023ffff000: 0x0000000000000000
(qemu) xp/1xg 0x23fffe000
000000023fffe000: 0x006000401000040b
(qemu) xp/1xg 0x23fffd000+0x48*8
000000023fffd240: 0x006000401024840b
On one hand, the TTBR0 translation setup by the UEFI is being relied upon, and on the other, its page tables are being written over. As a result, we see successful VA 0x9000000 translation, until the kernel decides to overwrite the TTBR0 page tables.
Invalidating TLB alone would not have worked, since the kernel still relied upon TTBR0 tables, and on top of that overwrote/'corrupted' them. Any attempt to translate VA 0x9000000 after the corruption would still force the CPU to walk the TTBR0 tables and cause aborts.
Of course, after the changes you made, there may not be any attempts to translate through the UEFI-setup TTBR0, and so the aborts disappear. From the design of the kernel, it seems that it does not want to care about UEFI's maps, in which case it is better to disable the translation through TTBR0 inside TCR.EPD0, until after such a time when it is ready to launch user-mode.
Edit: Disabling translation through TTBR0 may also prevent the kernel from reading BootInfo, in case it implicitly relies on the TTBR0 maps to read from it. BootInfo pointer probably is a physical address.