I'm thinking how to find all data structures of a kernel that directly accessed by CPU hardware. For example, the page table, whose physical address is stored in the TTBR (ARM architecture). If the CPU doesn't know the addresses of such data structures, it cannot run. For data structures such as stack or heap, their address are virtual addresses, so the CPU doesn't directly access them but by MMU. We only consider CPU modes that the MMU is enabled, so we don't consider the real mode of x86, in which all kernel data structures are accessed by physical address.
CPU always deals with "just" addresses. hardware page walk is not a CPU duty, it's MMU that deals with it. there might be a lot more situations than those you think of, thinking only of x86. eg, it might be a trivial mapping without any MMU involving. CPU accesses VA 80000000 or a0000000, but the system address in both cases is 00000000. That's on MIPS. there is no MMU "enable/disable" over there, page walk is done by the OS. some ranges get mapped trivially as the above (kseg0, kseg1), for others you build page tables the way you want. CPU only cares if there is a valid entry in the TLB for it. if not, it calls your handler and that handler fills the TLB with the needed mapping.
The thing is it's not important for your question what addresses are - physical or metaphysical,
your question is "what structures CPU/MMU accesses and OS prepares and handles for it", right?
Then in case of ARM I can tell it's page tables of all levels. Because in ARM page walk is done by the MMU.
I'll try to tell about Aarch64 execution state (thus 64 bit ARM) maping organization, maybe it will be interesting. ARM, and 64 bit as well, is not a newcomer in this world and there are shitloads of cheap ARM mini-PCs on the market, so interesting to play with, still, surprisingly, OSDev enthusiasts didn't pay a deserving attention to it. as it seems.)
There is 2 TTBRs, TTBR0 and TTBR1, the second deals with the higher VAs (virtual addresses some of whose higher bits are all 1's, namely either bits 63-56 or 55-48, depends on the configuration but it doesn't really matter in this subject what exactly). the implemented maximum for VA space is 48 bits. It could be lowered programmatically to lessen the number of translation levels. There are 4 of such at maximum. There is 3 page sizes 4KB, 16KB and 64KB. a page table of every level is the same size as the page size chosen. On the example of 4KB pages:
The maximum addressable bytes is 2 * 2^48 = 2^49 = 512 TB of space.
the lower half is 0x0000_0000_0000_0000 - 0x0000_FFFF_FFFF_FFFF (256 TB)
the higher half is 0xFFFF_0000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF (256 TB)
TTBR0 holds the physical address of the Level0 PT for the lower part
TTBR1 holds the physical address of the Level0 PT for the higher part
A PT of every level is comprised of 512 8-bytes entries pointing at the next level PT or, for the level3 - a page physical address. There is also "blocking"* - analogous to x86's huge pages, it's for level1 and level2:
Level0 PT (bits 47-39) - 1 table. resolves all 256TB subspace
Level1 PT (bits 38-30) - 512 tables, every of which resolves 512GB
Level2 PT (bits 29-21) - 512*512 tables, every of which resolves 1GB
Level3 PT (bits 20-12) - 512*512*512 tables, every of which resolves 2MB.
* -In case of "blocking" a PT entry holds PA and attributes for the block of memory instead of the next level PT. The size of this block is that the whole next level PT could describe. level0 cannot hold block entries, so:
level1 - if the entry is a block entry, it holds PA and attributes for 1GB "page". Page walk in this case stops here.
level2 - if the entry is a block entry, it holds PA and attributes for 2MB "page". Page walk in this case stops here.
For other page sizes (16 KB and 64 KB) there are mostly quantitative differences, but the idea is the same. Of course, as I've said, you may cut off a little of VA space making the number of levels less.
For example I've chosen:
4KB pages, support for 1TB of space (512+512).
level0 is eliminated this way, TTBRs point to level1 PAs.
When you limit VA space, (writing into a special register, - namely TCR_EL1.TxSZ field), this impacts where the lower part ends and the higher part starts, for this example the range would be:
LO: 0x0000_0000_0000_0000 - 0x0000_007F_FFFF_FFFF
HI: 0xFFFF_FF80_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF
By the way, both parts have their own TCR_EL1.TxSZ field (TCR_EL1.T0SZ and TCR_EL1.T1SZ), so they might be configured independently.
And this is only about memory management. There is something else, I didn't touch yet.
The first thoughts are - exception handling. Exception vectors. It is also a structure that the OS needs to fill in and CPU accesses. I know nothing about it on Aarch64 yet, on armv7, it's just an array of 32-bit words ARM CPU jumps at when an exception occurs. Every slot corresponds to some exception. Mostly it's branches there, something like this:
Code: Select all
.align 5
ExceptionVectorsTable:
b FatalError @ Reset. No reset in NS state
b Undefined @ Undefined Instruction
b FatalError @ Supervisor Call, no supervisor calls at SEC
b PrefetchAbort @ Prefetch Abort
b DataAbort @ Data Abort
.long 0 @ Not used
b FatalError @ IRQ, disabled at SEC
b FatalError @ FIQ, if available at all, disabled at SEC